





ESTIMATING LINEAR RESTRICTIONS ON REGRESSION COEFFICIENTS 
FOR MULTIVARIATE NORMAL DISTRIBUTIONS' 


By T. W. ANDERSON 
Columbia University 


Summary. In this paper linear restrictions on regression coefficients are 
studied. Let the p X q. matrix of coefficients of regression of the p dependent 
variates on q. of the independent variates be’ B,. Maximum likelihood estimates 
of an m X p matrix I satisfying r’ B, = 0 and certain other conditions arefound 
under the assumption that the rank of B, is p — mand the dependent variates 
are normally distributed (Section 2). Confidence regions for T under various 
conditions are obtained (Section 5). The likelihood ratio test of the hypothesis 
that the rank of B, is a given number is obtained (Section 3). A test of the 
hypothesis that I is a certain matrix is given (Section 4). These results are 
applied to the “‘g-sample problem” (Section 7) and are extended for certain 
econometric models (Section 6). 


1. Introduction. 


1.1. Univariate analysis of variance. A large number of problems of univariate 
statistics can be put into the form of analysis of variance or regression analysis. 
We assume that 


(1.1) Er. = §'Z., 


where § and z, are column vectors’ of g components, that &(r. — 8’z.)) = o° 
and that x, is uncorrelated with r,(a # y). On the basis of a sample, that is, 
a set’ 21, ---, tw; 21, °°: , Zw (where there are q linearly independent 2,), 
we may test hypotheses about $, we may obtain point estimates of 6, or we may 
find a confidence region for the vector $. It is well known that this model is 
sufficiently general to include the analysis of variance model for fixed effects. 
The usual point estimate b of § is defined by 


’ 


1 By invitation parts of this paper were presented to the Cleveland Meeting of the Insti- 
tute of Mathematical Statistics, December 30, 1948. Some of these results were included 
in the thesis [1] submitted to the Mathematics Department of Princeton University in par- 
tial fulfillment of the requirements for the degree of Doctor of Philosophy, June, 1945; some 
other results were included in dittoed papers at the Cowles Commission for Research in 
Economics (of which organization the author is a research consultant). This paper will be 
reprinted as Cowles Commission Paper, New Series, No. 50. 

2 We use B to distinguish the capital ‘‘beta”’ (a matrix of parameters) from B (a matrix 
of estimates). Matrices are indicated by boldface capital letters, and vectors are indicated 
by boldface lower case letters. 

38’ is the transpose of 8. 

4 We use the same notation for observations as for random variables with the hope that 
the reader can easily distinguish between the uses on the basis of the context. 
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N N 
(1.2) b' > tela = D>, Lada. 


a=l a=l 


The estimate s° of o° is given by 


N N N 
3) W = gs = 2 Ge. - a) = Ye - 0 (D zat) 
a=l a=l a=] 
Consider testing the hypothesis 6. = 0, where 8, is a subvector of 8 with 
. - , : bet <i , 
gz components; that is, 8’ = (G1 G2). We partition 6 and z, similarly as b’ = 
, 8 , , = ces 
(b; bs) and z, = (2,4 Za). We use the statistic 


_ b:Qb, 


qos" 


(1.4) F 


? 


where 


N N N —1 WN 
(1.5) Q= dX ZeaZ2a — x Zsa Zia (> iretie) Din; Mellons 

a= a= a= a=] 
This statistic has the F-distribution with g. and N — gq degrees of freedom if the 
null hypothesis is true and if {z.} are normally distributed. The above statistic 
is equivalent to 

N 
| 22 (te — b'z2)" 

(1.6) — — came 


_ 'N an iis ’ 
Fit > @—t'a2’ 
N = q a=l 


*. > 
where 0; is defined by 


N N 
(1.7) BI > sueSie = 2, Zekie - 

a=] a=l 
The numerator is proportional to the estimate of o° when we do not believe the 
hypothesis to be true, and the denominator is proportional to the estimate of o* 
when we do believe the hypothesis to be true. We reject the null hypothesis 
when the observed F is greater than F,, .y_,(€), the significance point of the 
F-distribution with g. and N — gq degrees of freedom corresponding to a sig- 
nificance level e. 

We can also test the hypothesis that 6. = 62, any arbitrary given vector. 
A confidence region for 8: consists of all those $2 such that the corresponding 
hypothesis is not rejected on the basis of our sample. 

1.2. Multivariate analysis of variance. Now let us turn to the generalization 
of the analysis of variance to be used in the treatment of a vector variate. The 
expected value of a vector variate x. with p components is assumed to be 


(1.8) &x, = Bz., 


where B is a p X q matrix of regression coefficients. We further assume that 
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(1.9) &(x, — Bz.)(x. — Bz.) = =, 


a positive definite matrix, and that x, is uncorrelated with x,(a@ # y). As in 
the univariate case we may wish to estimate the regression coefficients. The 
estimate of each row of B is of the form (1.2). Thus B, the estimate of B, is 
defined by 


N N 
(1.10) B >> taZa = D>, Xala- 
a=1 a=l 
The estimate of S of = is given by 


N 


(1.11) (N — g)S = >> (xa — Bac)(Xa — Ba)’. 
If {x.} are normally distributed, B has a normal distribution with mean B and 
(N — g)S = Ahas a Wishart distribution with = covariance matrix and N — q 
degrees of freedom. 

To test the hypothesis B, = 0, where B, is the second submatrix in B = 


(B, B.), we use a generalization of the statistic G5 2 " F+ iy which was 
sv. | 


used in the univariate case, namely, 

N 

Dd (Xa — B2a)(%e — Bza)’ 
(1.12) feo: Ae isionesieeertinal deccoatconpaieatod 


N 


| Nv , a 
>> (xa — BYz.)(xa — BYza) | 


| awl | 





@- . 
where B; is defined by 


N N 

(1.13) BY v ZiaZia = z XaZia- 

The matrix in the numerator of U is proportional to the estimate of = when the 
null hypothesis is not believed true, and the matrix in the denominator of U is 
proportional to the estimate of = when the null hypothesis is believed true. We 
reject the hypothesis B, = 0 when the observed U is less than Up,¢.,—-9(€), the sig- 
nificance point at significance level e«. The distribution of U has been given by 
Wilks in many special cases [16]; Rao has given an approximation to U,,,,,~-¢(¢€) 
based on an asymptotic expansion of the distribution of log U [14]. The likeli- 
hood ratio criterion is U'” 

We can also test the hypothesis that B, = B}, an arbitrary given matrix. 
A confidence region for B, consists of all those B? such that the correspond? » zg 
hypothesis is not rejected on the basis of our sample. 

1.3. Rank of the regression matrix; linear restrictions. If B, ¥ 0, there enters 
into the multivariate case a new feature which does not appear in the univariate 
case. There may be some of the rows of B, that are zero; that is, there may be 
some components of x, that have expected values independent of Z2.. More 
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generally, all of the elements of B, may be different from zero, but the rank of 
B. may be less than the maximum possible. That implies that it is possible 
to take a linear combination of components of x, such that the expected value 
of this linear combination is independent of z2. . If the rank of B, is less than PD; 
there is a vector y (of p components) such that 


(1.14) 7’ B, = 0. 
From (1.8) we obtain 
(1.15) Sy'Xa _ 7 Bz. = 7’ Bizie - 


In general, if r is the rank of B, there are p — r linearly independent vectors 
y satisfying (1.14). 

In this paper we are primarily interested in estimating p — r linearly in- 
dependent vectors satisfying (1.14). In Section 2 we find that the maximum 
likelihood estimates of these vectors (under certain normalization conditions) 
are the characteristic vectors of B.QB, in the metric of S. In Section 5 we find 
confidence regions for these vectors using the theory summarized in Section 1.2. 

1.4. Examples. A number of multivariate problems can be thrown into the 
above form, some naturally, some a little unnaturally. As a simple example, 
consider a sample U,, --- , U, from N(w, &) and a sample of the same size 
V,,---, V, from N(v, ), where N(4, =) denotes the normal distribution with 
mean 2% and covariance matrix &. We can describe this by the above model. 
Let X. = Ua, Xara = Vola = 1, ---, 2). Leta. @ la =1,---,2n=N; 
and let 24 = lfora = 1,---,n,andz,. = —lfora = n+ 1, --- , 2n. Then 
the regression model holds with the first column of B being }(u + v) and the 
second being }(u — v). The hypothesis y = v is transformed into the hypothesis 
of the regression of x, ON Ze being zero. In this case B, consists of one column 
and is either of rank 0 or 1. The 7” test that it is of rank 0 is a special case of 
the tests given in Section 3. Estimation of linear restrictions on B, is trivial 
and is a special case of the treatment in Section 2. 

In a similar fashion we may treat certain three-sample problems. Suppose 
we have samples of size n from N(w,, &), N(w., Z), and N(w;, Z). Then 2., 
Zeq, and 23, can be chosen so that the three columns of B are 8; = wi + wo + ws, 
8. = wm — we, bs = we — w;. If the means lie on a line, that is, if pw, — w= 
k(ue — us) for some constant k, the rank of (8, 83) is one instead of two. Insuch 
a case there are p — 1 vectors that are annihilated by this p X 2 matrix. The 
intersection of the planes perpendicular to these p — 1 vectors is, of course, 
the line joining the means. This vector is simply proportional to one of the last 
two columns of B. In Section 7 we give a more general consideration of “gq-sample 
problems.” 


Now we shall consider a more elaborate problem that is naturally put into 
this pattern. Suppose that the workings of the economic system are such that 
the vector of economic variates x, has a mean Bz, , where z, consists of non- 
economic variates. Bz, might be called the vector of “systematic parts” (or 
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it is called the “reduced form’’). Suppose there is a vector y such that y’B = 0. 
Then the variate y’x, has mean zero. Under certain conditions the equation 
7'Xq = vq is called a “structural equation.” In many such economic models one 
would like to include in the z, “lagged” values of x, . Although many of the 
results in this paper apply to such a case, we shall exclude such considerations 
in order to simplify the discussion. 


2. Maximum likelihood estimates of the coefficients of the restrictions. We 
assume that x, is normally distributed with mean (1.8), that is, 8x. = Bizie + 
BoZ.e(a = 1, --- , N). Since the {z,} are fixed, we can make the transformation 


N N —1l 
, / 
2.2) Va = 22a — Zz ns to 3 zu Hs) Zia, 
p=l p= 


N N -—l 
(2.2) BY = B, + B, (> 22 zie) (> 21s zis) 
p=] B=ml 
Then the expectation of x, is 
(2.3) &x. = Biz. + B.v., 


and 2), and v, are orthogonal (in our sample). We want to estimate the p X m 
matrix F such that 


(2.4) iB, = 
In order to avoid trivial estimates we require 
rer =I. 


There is still an indeterminacy because (2.4) and (2.5) are satisfied if F is replaced 
by the product of F on the right by an orthogonal matrix. 

Before proceeding formally with the method of maximum likelihood, let us 
consider a more intuitive approach. We can make an analysis of variance for a 
linear combination c’x, . Then 

N N 
u (c’x.)° x C'XaXal 
ent on 


(2.6) 


N N 
c' BY > te2iaBi'c + c'B: > vav. Bre + (N — g)e'Se, 
aml a==1 


where By is defined by (1.13) and B,, the usual estimate of B, , is given by 
(1.10) or by 


N 
(2.7) B, = >) 20.0", 
a=l 
where Q, given by (1.5), is a v.U.«. The second term on the right in (2.6) 
is the “sum of squares of the v effects” and the third term is the “error sum of 
squares.”’ Since a restriction y is such that the expected value of a “‘v effect”’ 
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is 0, a good estimate of a y would seem to be the vector c that minimizes the 
“y effect sum of squares’’ relative to the “error sum of squares’’, that is, the c 
that minimizes 


, / 
c B.QB:c 
c’(N — q)Se 
The minimum ratio is the smallest root. of 


(2.9) B.QB, — ¢A| = 0, 


(2.8) A= 


where A = (N — q)S, and the vector c is the corresponding characteristic 
vector satisfying 


(2.10) (B.QB, — oA)e = 0. 


If there are m linearly independent restrictions, we use the characteristic vectors 
associated with the m smallest roots of (2.10) normalized by 


(2.11) c’ Ac = N. 


THEOREM |. Suppose x. (of p components) is distributed according to N( Bi2ia + 
B.z.., Z)(a = 1, --- , N). Suppose B, is of rank p — m. Then a set of maximum 
likelihood estimates of B, , Bs, &, and T satisfying (2.4) and (2.5) are 


(2.12) = 2 fates (> Z1a Zia) ‘— B, > Zon Z18( >, 210 S10) : 
a a 8 a 


(2.13) o = ([ - Sfrf’)B,, 
(2.14) $= H+ Hf + ©) e*fHz, 
(2.15) f = (Yp mth» °°°s Tp); 


where Bz is given by (2.7) or (1.10), H = [(N — q)/N|S is given by (1.11), ®* is 
the diagonal matrix whose nonzero elements (dp—mi1, *** » Pp) are the m smallest 
roots of (2.9), and ¥; are the corresponding vectors defined by (2.10) and normalized 
by +H; = 1/(1 + ¢,). f may be multiplied on the right by any orthogonal matrix 
to obtain another maximum likelihood estimate of Y. 

Proor. We shall maximize the logarithm of the likelihood function’ 


log L = — 4Np log 2x + 4N log; = 


(2.16) 1 N ah rs , ; sie nN 

—=)0 (ta — Brae — Bev)’ 2 (xe — Brzi. — B, v2) 
~ a=] 

with respect to &, BY, B., and I, subject to restrictions (2.4) and (2.5). Let 

® be anm X q matrix, and Wan m X m symmetric matrix of Lagrange multi- 

pliers. We shall maximize 


(2.17) f = log L + tr(@Bir) + ftr[wir’sr — J) 


5 The method is similar to that used in [3]. 
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by taking partial derivatives and setting these equal to zero (‘‘tr’ denoting 
trace). The partial derivatives with respect to the elements of T set equal to 
zero are 

(2.18) @B, + Wi’ = 0. 

Multiplication on the right by f gives 

(2.19) Bf + wish = 0. 


In view of (2.4) and (2.5), which are to be satisfied by B, , f, and &, this shows 
that & = 0. 

The partial derivatives of f with respect to the elements of &, BY , B., and 
r lead to 


a 


(2.20) N& = z. (ta — BYzia — Beva)(xa — Brzia — Bova)’, 
oD ae = Se FT. tne 


$Y xv, — £°B, ¥ v.v, + PO = 9, 
a a 


@B: = 0. 

The solution of (2.21) for BY gives Bf. From (2.22) we obtain 
(2.24) B, = >> x.v.0' + Sf6Q”. 
Multiplication on the left by f and on the right by Q gives 
(2.25) 0=f% > xv, + METS. 
Thus 
(2.26) @ = -f > x,v.. 
Substitution into (2.24) gives 

B, = (1 — Sf") B. 
In view of (2.26), (2.27), and (2.7), we derive from (2.23) 
(2.28) (1 — S£f’)B.QOBf = 0. 

We now consider (2.20), which can be written 


(2.29) NS = A+ Sff’B,.OBff'S. 


It is clear that if f,, &, B. is one solution of (2.4), (2.5), (2.27), and (2.29), 


another solution is fF, , £, B,, where f, = f,O and O is any orthogonal matrix. 
In this class of solutions let us consider those which make 


AON TY ok. <I TN 


CAN Ae CONIC a RD ARO AD PRONE ett 


ain ON ESA ARAL ELE ELLE PLLA AT 
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(2.30) ¥ f’B.QB;f = D, 


say, diagonal. Then (2.29) can be written 

(2.31) NE = A+ NEfDFE. 

Multiplication on the right by f gives 

(2.32) N&f = Af + NEFD, 

or 

(2.33) N&fU — D) = Af. 

We can write (2.28) as 

(2.34) B.QB,f — NEFD = 0. 

Since D is diagonal, multiplication of (2.34) on the right by (J — D) gives 
(2.35) B,QB;f(I — D) — NEf(I— D)D = 0. 

Substitution from (2.33) gives 

(2.36) B.QB;f(I — D) — APD = 0, 

or 

(2.37) B.QB,f = (B,QB; + A)fD. 

Thus, the columns of f satisfy 

(2.38) [B.QB; — d(B,QB; + A)|¥ = 0, 

where d is a root of 

(2.39) | BOB; — d(B.QB;, + A)| = 0. 

Let the roots of (2.39) be d, > d, > --- >d, > 0. Each column of Ff satisfies 
(2.10), where ¢ is one of 


d; 
(2.40) n= cr 


Let the solutions of (2.10), normalized by 
(2.41) cs Ac; = Nb; 


(where 6,; is the Kronecker delta), be ¢;, --- , Cp. lf ¢; # $; , then (2.41) is satis- 
fied automatically. Then 7; = kc; . To determine k; , multiply( 2.33) on the left 
by 4; . Then 

(2.42) N¥2fFU — D) = kic: AP. 

Thus 
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(2.43) N¥i2441 — dj) = kyecAejk; = Nbijki 


for i and j being indices of 4; in f. Since 4; is to be normalized by £, we have 
ki = 1 — d;. Thus 

1 
(2.44) vs ” Vil 7s, d; cy. = Vi+ $: Ci. 


Now we wish to show that we should take the vectors corresponding to the 
m smallest roots. Let C = (€;, «++, Cp), C* = (Cp-miir, *** » Cp). From (2.29) 
we obtain 


N& = A+) APU - D)DU - DY" IA 
(2.45) . 
wae vy Act — D)"DC*A, 


if fh = C* (4/1 — d; 6:3) for i,j = p — m+ 1, --- , p. Then 
‘O O---0 


(2.46) CSC =1+ 


say. Thus 


‘ | 
(2.47) fl=1eiFl=|54| Il +49. 


i=—p—m+1 


The logarithm of the maximized likelihood function is 
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j Pp 
(2.48) log L = —}pN log(2x) —4N log! CA —4N log [I (1+ ¢, —4pN. 


i= p—m-+1 
Thus log @ is maximized by choosing the smallest ¢,; . This completes the proof 
of the theorem. 

It should be pointed out that F can be normalized in other ways. Since & 
was shown to be zero, it is clear that the maximum likelihood estimate of 
under other such linear conditions can be obtained from (2.15). 

In the process of estimating we obtained an estimate of B. of rank p—™m. 
This can be written as 
(2.49) B, = £ff*' B, , 
where the columns of f* are the vectors satisfying (2.10) for the p — m largest 
roots of (2.9). Fisher [7] obtained the same result for a special problem (see 
Section 7). The author has given a different proof of this in [1]. Tintner [15] 
and Geary [8] have considered the problem for = known. 

The joint asymptotic distribution of the characteristic roots and vectors 


defined by (2.9), (2.10), and (2.11) has been given [2] under the general condi- 
: 


tions that g. > pand V Zz Zale approaches a limit in such a way that, roughly 


a=l 


Ll. =, 
speaking, the multiplicities of the roots of | vy BOB: — \Z| = O are the same 


for all N (see [2] for the exact conditions). In particular, if the nonzero roots 


as ye . ; ’ 
of | B. lim vOB: — AZ| = O are all different, the limiting distribution of C* 


is given. Since p lim d; = 0(¢ = p — m + 1, ---, p), it follows from the 


vo 
theorems of Rubin quoted in [2] that the limiting distribution of f is the same 
as that of C*. 


3. The likelihood ratio criterion for testing the number of linear restrictions. 
If the number of independent restrictions on B, is m, the rank of B, isr = p — m. 
Testing the hypothesis that the rank of B, is r; against the alternative that it is 
not greater than r2(>r,) is equivalent to testing the hypothesis that the number 
of restrictions on B, is m, = p — rm against the alternative that it is m:. = 
p — Y2(<m,). The likelihood ratio criterion is the ratio of the likelihood maxi- 
mized under the hypothesis of m, to that maximized under the hypothesis 
of m.. From (2.48) we see that the criterion is 


Pp Pp r2 = 
@1) Aw TT at+av"/sy H a+" -. TT a+e)™. 
t=r;+1 i=ro+l t=r;,+1 
THEOREM 2. Suppose X. (of p components) is distributed according to N(Bitia + 
B.Z24 , X)(a 1, --- , N). The likelihood ratio criterion for testing the hypothesis 
that the rank of Be is r; against the alternatives that it is ro(> 1) ts (3.1), where 
'd;} are the ordered roots (in descending order) of (2.10). 
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In particular, the criterion for r = 1r;(m = m,) against all other possible 
alternatives requires the product over all the p — r;(=m,) smallest roots. In this 
case 


(3.2) —2logX = N e > log (1 + ¢i) 


t=r)+1 


DP 
is for large samples approximately N > ¢,, which has been suggested by 
t=<r +1 


Fisher [7] and Hsu [9], [10] for testing this hypothesis. 


3 | 
THEOREM 3. Let ve approach a nonsingular limit as N — ~, and suppose B, is 


of rank r; . Then —2 log d, where d is the likelihood ratio criterion defined in Theorem 
2 for testing the hypothesis that the rank of B, is r, against the alternatives that it is 
not greater than p, is asymptotically distributed like x’ with (p — n)(@ — 1) 
degrees of freedom. 
Pp 
Proor. Let 6; = N@;. It has been shown by Hsu [10] that >> No = 


t=<r,+1 
i 
y Be 6; has an asymptotic x*-distribution with (p — 1:)(q2 — 1) degrees 
t=<r);+1 
of freedom®. Let {@iv} be a sequence of real numbers such that @iy — 6;. for 
P 
each 7. Then NV _ log (1 + @in/N) — > 6;.- The proof is concluded 


i=rj+l i=rj+1 
by applying the theorem of Rubin (see [2], Section 2). 

It might be observed that Theorem 3 does not follow from the usual theorems 
concerning the asymptotic distribution of —2 times the logarithm of a likelihood 
ratio criterion. In fact, if re < min(p, g2), then —2 log \ does not have a limiting 
x’-distribution. However, as N — ©, its distribution approaches the limiting 


distribution of >> 6;. This distribution can be obtained from the limiting 
t=r );+1 


distribution of {4@;} [9], [2]. 


4. Testing hypotheses about the linear restrictions. 

t.1. Case of one restriction. Suppose we wish to test the hypothesis that 
(4.1) g'B, = 0. 
where g is a specified (p-dimensional) vector. The (q-dimensional) vector 
Big is distributed according to N(Bjg, g’=gQ™'), where Q is given by (1.5). 
When the null hypothesis is true, g’ B, has mean 0 and, therefore, g’ B,.QB2g/g’=g 
has a x’-distribution with q2 degrees of freedom. From the analysis of variance of 
g’x. we see that g’ Ag/g’Xg is distributed independently and according to a 
x'-distribution with N — q degrees of freedom. When the null hypothesis is 
true, 


9 9'B:QOB:g N—q _ 1 g'B,QBig 
(4.2) es nee ae. = Sere 
g' Ag qe q g'Sg 


® This also follows from [2] and an application of the theorem of Rubin mentioned before. 
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has the F-distribution with g. and N — q degrees of freedom. Therefore, we have 
the following theorem. 

THEOREM 4. Let xq (with p components) be distributed according to N(Bi2ia + 
B.z.., =), a = 1, ---, N. Define the (p X qe) matrix By by (1.10), Q by 
(1.5), and S by (1.11). Then the critical region of a test of the hypothesis (4.1) at 


significance level € is 


, ; 

‘ B.QB; : 
(4.3) 9 B0B9, 7, (0. 

ng'Sg 
It may be noticed that we do not need to put any condition on the rank of B. 
1.2. Case of several restrictions. Now consider testing the hypothesis that 
‘xs ° 
(4.4) 9:B. = 0, *=1,---,m, 
where gi, --* , g» are given (p-dimensional) vectors. We shall assume that 
(91, °°: , Gm) = G is of rank m (otherwise some of (4.4) are redundant). The 
. . =/ . . . =/ 

(q2-dimensional) vectors B.g; are normally distributed with means Bog; and 
covariances 


(4.5) (Big: — Big.)(B.g; — Bag3)’ = gi%g;0™. 


When the null hypothesis is true, the expected value of Big; is 0 and G’ B.QB2G 
is distributed according to W(G’ZG, q), that is, the Wishart distribution with 
covariance matrix G’EG and q degrees of freedom.’ Also G’ AG is independently 
distributed according to W(G’EG, N — q). When the null hypothesis is true, 


| G'B.QB:G + G’AG | 
iY <: Aki 
has the Un,¢.,~—q distribution. The following theorem results: 

Tuerorem 5. Let xq (with p components) be distributed according to N(Byzie + 
B.z.. . =), a = 1,---,N. Define the (p X q) matrix B, by (1.10), Q by 
(1.5), and S by (1.11). Then the critical region of a test of the hypothesis G’B, = 
0, where G ism X p, at significance level ¢ ts 

G’AG : 
ee ae, ee See, ee 
G B.QB.G > G AG | F 

The above is the likelihood ratio test of hypothesis (4.4); this is based on 
Wilks’ test for the general linear hypothesis. Another test based on the test 
suggested by Lawley (and later by Hotelling) for the general linear hypothesis is 
based on the statistic tr[G’ B,.QB:G(G’ AG)"]. The test (4.7) has the usual 
properties of the likelihood ratio test; it is consistent; —2N times the logarithm 
of (4.6) has approximately the x’-distribution with mq, degrees of freedom 


(4.6) 


(4.7) 


<a . 1 ' 
when N is large (under the assumption that vy? tends towards a nonsingular 


limit). 


7 W(¥, r) is defined as the distribution of =}Y, Y., where Y, are independently distributed 
according to N(O, ¥). 
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1.3. Approximate test of rank. If the rank of B, is p, then (4.1) cannot’ be 
satisfied for any vector g; that is, Bjg must have a mean different from 0 for 
every g. This suggests that if we are interested in testing the hypothesis that 
the rank of B, is p — 1 against the hypothesis that it is p, a possible procedure 
is to reject the hypothesis if (4.3) holds for every g. This will be true if the 
minimum of the left hand side of (4.3) with respect to g is greater than F,, ~_,(e). 
The minimum is the smallest root of 


i ' 
(4.8) i B.QB, — fs | = 0. 


i 
The smallest root is f, = [(N — q)/q]¢,. Thus a critical region for this test is 
(4.9) fp 2 Fa,.n-a(€)- 
This test is “conservative’’; that is, the probability is less than ¢ of ‘rejecting 
the null hypothesis when it is true. 

We can use the results of Section 4.2 to generalize this technique. Only if 
B, is of rank p — m can (4.4) be true for m linearly independent g; . A possible 
test of the hypothesis that B, is of rank p — m against alternatives that the rank 
is greater consists of rejecting the hypothesis when (4.7) holds for all G of rank 
m, that is, if (4.7) holds when G is chosen to maximize the left hand side. The 
maximum is obtained by taking as columns of G the vectors satisfying 
(4.10) [A — ¥(B:QB; + A)|x = 0 
corresponding to the m largest roots of 
(4.11) | A — ¥(B.QB; + A)| = 0. 

Let these roots be i; > yo > +--+ > wp. Then 

| GAG | 

(4.12) or oe Te 
|GB.QB,.G+GAG\ iz 


It is easily seen that ¥; = 1/(1 + ¢,_,). Therefore, a critical region for testing 
the hypothesis that B. has rank p — m is 


p 
(4.13) Lh Aik 637.S -Unine O 


i= p—m+1 


In other words, Um.¢,,v—e(€) is an approximation to the significance point for 
the criterion at significance level e. 


5. Confidence regions for the coefficients of the restrictions. 
5.1. Case of one restriction. If B, is of rank p — 1, there is one vector y satisfying 


and a normalization, say 


(5.2) 


SL ERE ge OME AR NLA EAE OMT 
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(with the rank of (B. ®) being greater than that of B,). If ® is a given (known) 
matrix, then a confidence region for y, given the statistics B, and A (and Q), 
consists of the vectors g (satisfying g’@g = 1) for which the test in Section 4.1 
does not lead to rejection. If @ = = then we make use of the fact that in this 
case +’ B,Q’ By and y’ Ay have independent x’-distributions. 

Tueorem 6. Let x. (with p components) be distributed according to N(By2i2 + 
B,s.., Z), a= 1,---,WN. Define the p X qz matrix B, by (1.10), the non- 
singular Q by (1.5), and S by (1.11). If the normalization of y is (5.2) for ® 
known, then a confidence region for > defined by (5.1) with confidence coefficient 
1 — ¢ consists of the vectors g satisfying 


Bat 'B.QB; 
(5.3) eae S Po..n-< (e) 
and 


(5.4) g'@g = 1. 


If the normalization is y'Zy~ = 1, then a confidence region of confidence (1 — «&) 
(1 — €2) is the intersection of 


(5.5) 9 B.QBig < xi, («) 
and 
(5.6) Xv—a(€2) < g' Ag < Xv-a(e), 


where ©a4(€1) is chosen so that the probability of (5.5) is 1 — « when g = vr and 
Xy—g(€) and Xx-o(€2) are chosen so that the probability of (5.6) is 1 — € when 
e =. %- 

These kinds of confidence regions were developed by Rubin and the author in 
[3] following a suggestion by Girshick. Bartlett [5] has used this method in 
treating an econometric problem. 

5.2. Case of several restrictions. When B, is of rank p — m, there are sets of m 
linearly independent vectors yp—mi1,°°* , Yp Satisfying 


(5.7) 7. B, = 0. 


Of course, if we take a set of m linearly independent linear combinations of 
Yp—m+1,°** » Yp We obtain another set of vectors satisfying (5.7). We can take 
out some of the indeterminacy in the definition of yp-m4i , --* , yp by requiring 


> , ~ - 
(5.8) ity; = 4i;. 


However, F = (yp—m41, °** » Yp) can still be multiplied on the right by an arbi- 
trary orthogonal matrix. Let us suppose that there are m(m — 1)/2 more inde- 
pendent restrictions on I, 

(5.9) (rT) = 0, v= 1,--- ,m(m — 1)/2, 
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with f,(I") being a completely specified function. We assume that (5.7), (5.8), 
and (5.9) determine Tf uniquely.® 

If gp: = y:, then g;B.QBsg; is distributed as x’ with g, degrees of freedom 
and independently of gB,QBig; (i + j). The matrix (g;Ag;) is independently 
distributed according to W(I, N — q). Then a confidence region may consist of 
the sets of vectors g; , --- , Gm Satisfying (5.9) with G for T, 


(5.10) 9: B.QBig: < x2,(e:), p= 1,-:- 
(5.11) die) < gi Ag; < di,(e), 


where djj(€) < 6;; < dj;(€) are chosen so that the probability of (5.11) for all 7 
and 7 is 1 — e when G’=G = I. The confidence coefficient is (1 —- «)(1 — &) --- 
(1 — e»)(1 — e). Unfortunately, since the Wishart distribution has not been 
tabulated, the intervals (5.11) could be obtained from present tables only for 
m = 2, in which case one could use the distribution of the variances and the 
correlation coefficient. 

The confidence region defined by (5.10) and (5.11) has the same confidence 
coefficient as the region which is the intersection of this and (5.9). If we do not 
impose the conditions (5.9), there is the indeterminacy of orthogonal trans- 
formations in the regions; that is, if G is in the region, GO is in the region if O 
is orthogonal (for most G’s). If one is interested simply in estimating the linear 
subspace spanned by yy-m41,°°* ,» Yp. then this region (not imposing (5.9)) is 
adequate. 

Under the restrictions imposed here we could replace (5.10) by 


(5.12) 2 9: B, OBg; < Xmq,(€*), 


and obtain a region with confidence coefficient (1 — ¢*)(1 — ¢€). Other regions 
could also be constructed by replacing (5.12) by other inequalities which take 
into account that g;B, are normally distributed with mean @ when g; = ¥; . 
For example, (5.12) could be replaced by 


(5.13) | G’B,QOB,G | < V.n.w-(e*), 


where [1/(N — q)"]V m,w—¢e(e*) is the e* significance point of the distribution of 
the generalized variance of m dimensions and N — gq degrees of freedom (for 
covariance matrix J). 

Another kind of sets of restrictions is 


(5.14) y=; = 0, i ¥ j, 


and (5.9) for v = 1, --- , m(m + 1)/2. When G = I, g;B.QBig;/g:%g; has a 
x’-distribution with g. degrees of freedom and independent of gj B.Q Big; (i ¥ j) 
and g, Ag: . Thus [g;B.QB1g:/g;Agi)-[((N — q)/q2] has the F-distribution with 


8 This problem of “‘identification’’ has been studied by Koopmans, Rubin and Leipnik 
in [12]. 
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q. and N — gq degrees of freedom. Moreover, the set of m(m — 1)/2 random 
variables g; Ag;/+/g:Agig, Ag; ( i ~ j) is independently distributed like the 
set of correlation coefficients r;; (¢ # j; 7,7 = 1, --- , m) based on N —qg+1 
observations from N(0, J) using deviations from the sample mean. Define 
rise) < 0 < #,;(€) for t ¥ 7 such that the probability ri;(e) < rij < Fie) @ ¥ 
j; t, 7 = 1,---, m) is 1 — «. Then a confidence region with coefficient 
(1 — e&) --- (1 — e,,)(1 — €) is the intersection of 


e446) 9:B.QB.gi 


(5.15) 9:89: < q2 Fq,,n-e(€s); 


91 Ag; 
V9 Agi 9's Ag; 
and f,(G) = 0 for v = 1, --- , m(m + 1)/2. If one imposes f,(f) = 0 (v = 
1, --- , m) only for normalization of the vector y; , there is the indeterminacy 
of orthogonal transformations. In this case a confidence region may be the inter- 
section of (5.15), (5.16), and f,(G) = 0,v = 1, --- , m. If there are no restric- 
tions f,(f) = 0, the vectors y; are not normalized. A confidence region then 
may consist of the intersection of (5.15) and (5.16). 

Now let us suppose that equation (5.14) does not hold. Instead, suppose the 
matrix I is determined uniquely by restrictions (5.9) forv = 1, --- , m’. Then, 
in general, g;B,QBig; is not distributed independently of g;B,QBig; (i ¥ j). 
We now make use of the theory given in Section 4.2. A confidence region with 
confidence coefficient 1 — ¢ is given by the intersection of 


G’AG | : ; 
as ! > 
| G’B, OB: G + G'AG| = Um.az.n—a(€) 


(5.16) rile) S < 75(0), L# J; 


(5.17) 


and f.(G) = 0 for v = 1, --- , m’. If the restrictions f,(f) = 0 are less than 
enough for unique identification, (5.17) together with the restrictions imposed 
on G constitute a confidence region with coefficient 1 — e. 
Let us summarize the above results for the cases of unique identification: 
TuHeEoreM 7. Let x. (with p components) be distributed according to N(B,21. + 


Bo2eqa , =), a = 1, --- , N. Define the p XK q. matrix By by (1.10), the nonsingular 
Q by (1.5), and S = [1/(N — q)|A by (1.11). (a) A confidence region for them XK p 
matrix TY, the unique solution of r’B, = 0, r’=r = I, and f(r) = 0 (= 1, 

, m(m — 1)/2) with confidence coefficient (1 — «&) --- (1 — en)(1 — ©) con- 
sists of matrices G satisfying (5.11), (5.10), and f.(G) = 0; a region of confi- 
dence (1 — e*)(1 — e) consists of the set of matrices G satisfying (5.11), (5.12), 
and f,(G) = 0, or the set of matrices G satisfying (5.11), (5.13), and f,(G) = 0. 
(b) A region for T, the unique solution of 1B, = 0, F’ZV being diagonal, and 
f(T) =0( =1, --- , m(m + 1)/2) of confidence (1 — &) --- (1 — €m) 
(1 — e) consists. of the matrices G satisfying (5.15), (5.16), and f,(G) = 0. 
(c) A region of confidence 1 — ¢ for T, the unique solution of Y’B, = 0 and 
f(") = 0 (v = 1, --- , m*) consists of the matrices G satisfying (5.17) and 
i(G) = 0. 
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In this section we have assumed that the restrictions on I were just sufficient 
to take out indeterminacy in the definition. If there are more restrictions (i.e., 
some restrictions are redundant), they may all be applied to the matrices G 
which define the confidence regions. 

It might be mentioned in passing that a confidence region for B, under the 
restriction that the rank of B, is r is given by the set of all matrices (p X qz)®* 
of rank r satisfying 


| A| 
| (Be — ©*)0(B, — @*) + Al — Ure n—(€). 
5.3. Consistency of confidence regions. It is clear from the preceding discussion 


that there are many ways of constructing confidence regions for . One desirable 
property of a confidence region is that it is consistent. By consistency of a con- 


(5.18) 





a : ar > 
tidence region we mean that if ve approaches a nonsingular limit (as N — «) 


the confidence region for I is arbitrarily small with arbitrarily high probability 
for N large enough. 

It is easy to verify that if there are restrictions on I sufficient for identifica- 
tion the regions given in Sections 5.1 and 5.2 are consistent. Consider, for ex- 
ample, the first region given in Section 5.2. The inequalities (5.10) can be written 


Ce 1 
(5.19) 91 Bs x y B29: < =F Xiao (€i). 


For N sufficiently or the right hand side of (5.19) is arbitrarily small, N 19; is 
arbitrarily near lim 5 vy 2 and B, is arbitrarily near B, with probability arbi- 
n= N 


tarily near one. If CB, ~ 0, then N can be chosen large enough so that GB, will 
have an arbitrarily small probability of satisfying (5.19). 

As a matter of fact, consistency of the regions holds even if the assumptions, 
such as normality of x, , are relaxed. Moreover, the confidence coefficient has 
approximately the value given here if N is sufficiently large although some of 
the conditions are not fulfilled.® 


6. Econometric models. 

6.1. Point estimates for certain ‘‘shock’”? models. In many econometric models 
the relations between variables may be expressed in terms of a system of sto- 
chastic linear equations 


(6.1) Ox. + Wz. =e, 


where x3 is a vector of p* endogenous (economic) variables and z, is a vector of 
° . *. . . 

q exogenous (noneconomic) variables, and e, is a vector of p* disturbances. This 

model has been called a ‘“‘shock model” [13]. For ® square and nonsingular we 


® See [4] for the treatment of the special case m = 1. 
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. ° 1 * 
can solve (6.1) for’? x2, 


(6.2) x, = -@'Wz.+@'e.. 


The distribution of e* for fixed z, induces the distribution of x* . Let 
(6.3) —@'w = B* 
(6.4) @'e*® = ne. 


Equation (6.2) is the so-called “reduced form.” 

Suppose there are m rows of ® that are known to contain only p components 
of x2 ; let the vector of these p components be x. , and let these m rows and p 
columns of ® consitute a submatrix I’. Suppose that of these m rows of W there 
are only gq, columns different from zero; let the subvector of z. with nonzero 
coefficients be 2,2 , and let the m rows and q, columns of W constitute the matrix 
®. Let the remaining components of z, constitute Z2. . Thus we have partitioned 
® and W as 


r 0 o 
(6.5) a r= : 
®,, Dy» We, WP 2/ 


The m equations we are primarily interested in are 
(6.6) I’xXq + O21q = ta- 
The part of (6.2) involving xq is 
(6.7) Xa = Bitia + Beton + na. 

We shall assume that n, is distributed according to N(0, X). Since the coeffi- 
cients of Zs, in (6.6) are zero, 
(6.8) r’B, = 0, 
In order that (6.8) have a unique solution for I except for premultiplication 
by an arbitrary nonsingular m X m matrix, we shall assume that gq > p — m 
and that the rank of B, is p — m. Then the block of m equations is identified. 
To completely determine [ we may require that the columns y+, satisfy some 
normalization conditions, for example, 

. , 

(6.10) You, = 1, 
and that there are m — 1 coefficients in each row of (I @) that are specified 
to be zero. It has been shown that a given equation is then identified if the rank 
of the matrix formed from (I @) by taking the columns containing the zero 


coefficients is m — 1 and if the rank of the (p* — m) X (p* — p + q:) matrix 
(®x Wr) is p* — m. 


i a ; rate ae * ‘ 
” This could also be considered as an ‘‘error model’”’ with @~'e, the error part of x, and 
z, not subject to error 
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Suppose we have observations (X. , Zia, Z2a), a = 1, --- , N. Then we can 
obtain an estimate f* of I’ satisfying the restrictions of Section 2 by the method 
described in Section 2. We similarly have 


(6.11) 6* = —fB, = —f BF. 


To obtain f satisfying the restrictions above we must find a matrix D such 
that 


(6.12) (f’ 6) = D(f* 6*) 


satisfies these restrictions. The identifying restrictions make D unique. Suppose 
a given row of (I’@) is (y’, 0, o’, 0) where there are m — 1 coefficients 0. Let 
(f*’ @*) be partitioned similarly into 


(6.13) (fY" fy" OF 62). 
Then the corresponding row d of D must satisfy 
(6.14) (00) = d(f'*; OF). 


The matrix on the right is of rank m — 1, and the solution of d is unique except 
for a proportionality constant. That is determined by the sample equivalent of 
(6.10). 

The type of shock model considered in this section seems special. However, 
the method of estimation may be useful if a block of m equations is identified 
even though the restrictions on (IF @) are more than enough to identify each 
equation within the set of m. One could ignore the surplus restrictions. 

In time series analysis the index a denotes the time. In many models com- 
ponents of Z. may be components of Xa-1 , Xe-2, °** (i.e., lagged values). The 
estimates given above are nevertheless maximum likelihood estimates. 

6.2. Confidence regions for coefficients in ‘‘shock models.’’ The shock models 
treated in Section 6.1 are of a special sort in that a block of 0 coefficients is re- 
quired to be given by a priori conditions. The idea of confidence regions con- 
sidered in Section 5 can be used, however, in more general circumstances. We 
shall now discuss this subject in greater generality. For convenience we shall 
modify the notation of Section 6.1. Let x. be the (p-component) vector of all 
the endogenous variables and z, the (g-component) vector of all exogenous vari- 
ables, and let us write the set of “structural’’ equations as 


(6.15) @x, + WZ. = €a- 


Let ®@ ‘w = —Band @"e, = n, . Then the “reduced form” of (6.15) is 
(6.16) x. = Bz. +n. 


We assume n, to be distributed according to N(0, &). We are interested in the 
first m rows of ®, which we shall call I’, and the first m rows of W& which we shall 
eall 8. 


We shall suppose that the restrictions for effecting identification are of one of 
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the three alternative kinds: (1) F’Er = TJ and y,;; = 0, &: = 0 for certain pairs 
(i, 7) and (k, 1); (2) r’=Pr diagonal, y;; = 0 and 6,,; = O for certain pairs (2, J 
and (k,l), and f,(f') = Oforv = 1, --- , m (for normalization); and (3) yi; = 9, 
6: = O for certain pairs (7, j) and (k, l), and f,(f) = 0 for v 1, --- ,m (for 
normalization). These kinds of restrictions have been studied extensively [12]. 

In any case the (m X q) matrix I’ B (where B is defined by (1.10)) has a nor- 
mal distribution with expected value IB = —®; the covariance between the 
1, jth element of I’ B and the k, lth element is yy’, where (m”') = M 
and 


N 
(6.17) M = p> Zale. 
Thus (i B — @)M(B’r — @’) is distributed according to W(r’ =P, q). Further- 
more I’ AT is independently distributed according to W(r’=r, N — q). 

Iset 6; be the ith row of ©, and let 67 consist of the qe components of 6; which 
are not specified to be 0. Let Bf be composed of the columns of B corresponding 
to the components of 6; so that y; BS = —6;. Let B3* be composed of the 
other a columns of B; then +.B?" = 0. If BS and B** are formed similarly 
from B, then ¥, Bi", --- , ymB2* are jointly normally distributed with means 
0, and the covariances involve only r, ©, and M. 

Case 1. Since Er = J, the distribution W(J, q) of (I’B — O)M(BT — 8’) 
and the distribution W(IJ, N — q) of ©’ AT have all parameters known. A con- 
fidence region for F and © of confidence (1 — ¢)(1 — «&) --- (1 — €m) consists 
of all G and T satisfying 


(6.18) (g;B — t)M(B’g; — t;) < x2(«,) 
and (5.11) and the identification conditions. It is understood that in g; and t, 
above we set those coefficients equal to 0 that are so specified in y; and 6; , 
respectively, by the a priori identification restrictions. A confidence region of 
confidence (1 — €)(1 — e*) is the intersection of 
(6.19) tr(G’B — T)M(B’G — T’) < xi..(e*) 
and (5.11) with the identification conditions imposed on G and T. Thirdly, a 
confidence region of confidence (1 — ¢)(1 — e*) is the intersection of 
(6.20) | (G’B — T)M(B’G — T’) | < Vun.w—o(e*) 
and (5.11) with the identification conditions imposed on G and T. 

We can also construct confidence regions for F alone. A region of confidence 


(1 — e)(1 — «) --- (1 — em) consists of g; satisfying (5.11), the identification 
conditions on y; , and 


(6.21) gi Bi" QO:BI"'gs < xis(«,), 


nt « > 9 
where Q; is composed of the rows and columns of M™ according to the way 
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B;* is composed of the columns of B. A region of confidence (1 — «)(1 — «€*) 
consists of g; satisfying (5.11) ,the identification conditions, and 


(6.22) 2, 9:BE QiBi"'gs < xzeis(¢*). 


Case 2. Since F’EP is diagonal, (y:B — 6) M(B'y; — 0;)/7;2¥; is distributed 
according to the x’-distribution with q degrees of freedom independently of 
(y;B — 0;)M(B’y; — 0;) and of y;Ayi/y:Zy;i , the latter being distributed 
according to the x’-distribution with N — q degrees of freedom. A confidence 
region of confidence coefficient (1 — ¢)(1 — «) --- (1 — €m) consists of G and 
T satisfying the identification conditions (including f,(G) = 0), (5.16), and 


(iB — t)M(B'9:i— t) N-ae 
Bena Meg em S Paine te. 
g:Agi q oo 


If identification is effected by the restrictions y;; = 0, 6;; = 0, and f,(T) = 0, 
then (5.16) is unnecessary. 

A confidence region for © alone of confidence (1 — ¢)(1 — 4) --- (1 — €m) 
consists of G satisfying the identification conditions, (5.16), and 

, ** ** 
(6.24) 9: Bs OB: oN = VT < Pyne (e). 
gi A 9i qi 

Case 3. In this case a region of confidence 1 — ¢ consists of G and T satisfying 

the identification conditions and 


| GAG | ane 
| (GB —NMBG —T)+ GAG | — Umar (0- 


(6.23) 


(6.25) 


A region for © alone could be given, but since it is more complicated than the 
above we shall not write it here. 

It is clear that there are many ways of obtaining confidence regions. For other 
combinations of identification conditions we could give similar kinds of confi- 
‘dence regions. A property that all the regions given in this section have is con- 
sistency (except the region involving (6.20)). In fact, if some of the assumptions, 
such as that of normality of n, , are relaxed, the regions are nevertheless con- 
sistent (see [4]). Furthermore, as shown in [4], the confidence coefficients of the 
regions given above when m = 1 approach 1 — eas N — under certain condi- 
tions even though the variables are not normally distributed and even though 
some of the components of z, are “lagged” values of components of x, . Simi- 
larly, it can be shown for m greater than 1 that if the regions are used when the 
assumptions are relaxed in certain ways one can have confidence about 1 — e if 
N is large enough. Sufficient conditions are given in Theorem 6 of [4]. 

It might be remarked in passing that if the number of 0 coefficients in (y; 6,) 
are more than enough for identification some columns can be dropped from 
B?*: if p — 1 columns remain one can determine confidence regions for 1; Or 


| 
: 
i 
} 
: 
; 
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(y; @;). It will also be noted that regions can easily be constructed when the 
identification equations are of other kinds. In any case they are simply imposed 
directly on G and T (as long as the restrictions do not involve &). 

6.3. An “error” model. In an error model we consider each observed variable 
as composed of a “systematic part” and an “error.” If x, is the vector of ob- 
served values, &, the vector of systematic parts, and v, the vector of errors, 
then x. = E. + v.. The m linear relations are taken to hold on the systematic 
parts, that is, 

(6.26) rt, = €. 


We shall assume that v, is distributed according to N(0, =), and that &, is 
“*fixed.”’ 

If = is known, Tintner [15] has suggested estimating the columns of I as the 
vectors (or linear combinations of the vectors) satisfying 


N 
(6.27) (> XaXe — az) = 0 


a=l 


corresponding to the m smallest roots of 


7 
(6.28) | Do aXe — AZ| = 0. 
| a=] | 
The obvious shortcoming of this procedure is that usually = is unknown. 

As a modification of this procedure Bartlett [5] in a special case and Geary 
[8] when = is known have suggested that &, be represented as Bz, , where the 
components of Z, are given functions of time (preferably orthogonal functions). 
It is clear that the methods proposed here can be used in these circumstances. 


7. Another example; a g-sample problem. Consider qg multivariate normal 
distributions N(w,., =) (k = 1,--- , g) with common covariance matrix. The 
means may be represented as q points in a p-dimensional space. We may ask 
whether these points lie in an r-dimensional linear subspace, or we may ask what 
is this r-dimensional subspace. Fisher [7] considered a related problem; a theory 
about gene structure of three varieties of iris led to a hypothesis that the means 
of three populations were on a line. (Since the relative distances on the line were 
also specified, Fisher’s hypothesis could be reduced to a hypothesis specifying 
rank zero.) 

Suppose we have a sample {x“?}(a@ = 1,--- , N;) from each population 
Niue , =) (kK = 1, --- , q). Let 


= Lone . 
7.1) . > N (Nim + --> + Neu), 


where N = N, + --- + N,. The hypothesis that the y, lie in an r-dimensional 
space is equivalent to testing the hypothesis that 


(7.2) 


(u-Be— °°. &— B) 
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is of rank r. It is well known (Hsu [11], for example) that this model can be put 


into the form of (1.8). Thus we can apply Theorem 2 for deriving a test function. 
Let 


Nek 
(7.3) fi = > x /M., 


a=1 


q@ MN 


(7.4) oY On, 


k=l a=l 
qa NM 


(7.5) A= D> x? — Wah — x,’ 


k=l a=l 


Then Z, = ~/N;ix is distributed according to N(1/ Nive , Z). Let F be an orthog- 
onal matrix with first row (\/Ni/N , --- , WN,/N). Let 


qd 
(7.6) Y, = > fuZr. 
k=l 
Then 
(7.7) &Y, = ofa Nim =v, 


say. The Y; are independently distributed according to N(v,, &). We have 
4 = V/Nu and 
(7.8) vy = > fu Nil ue = 9), l= 2,---,4q, 


and the rank of (vw. , --- , vg) is that of (7.2). 

For the purposes of testing rank, a model equivalent to the one above is a 
model with N random variables, ¥; , --- , ¥, , and N — q others independently 
distributed according to N(0, =). Let the Ith coordinate of z, be 5: , and let 
B = (v,,--- ,v,). Then this model is that of Sections 2 and 3. B = 
(¥:,---, ¥,) and B.QB) is 


q—1 q qa 
+e = UOYNM — MiNi = DO Nikiet, — NE’ 
Xe k=? k=l k= 
(7.9) " 
= > Ni (te — *)(% — 2%)’. 
k=l 

Then A is as defined in Section 1. The ¢; (¢ = 1, --- , p) are the characteristic 
vectors of > 

@ 
(7.10) (3 wiles — 8) — 2)’ - oA)e = 0 

k=1 
satisfying 
(7.11) Ac; = Nb; , 


where ¢; is the ith ordered root of 


* 
‘ 


(7.12) p> Nilf — ¥)(%, — ¥)' - oA | = 0. 
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‘ 1 ie 
The estimate of » = VN v, is X. Let 


(7.13) R=$ (> wath .-) 


inl 1+ 


where & is given by (2.14). Then 
(7.14) vE = RY, ’ 


Since F is orthogonal, we have 


qg 


cm a 
V Nite = a Vila = R> Yi fu + Yifu 
=1 im? 


(7.15) 7 
R 2 Yifu + (I — R)¥ifu = RV Nike + (I — R)VNiE. 
=] 
Thus 
(7.16) in = R(Z, — Z) + &. 


The likelihood ratio criterion for testing the hypothesis that the rank of 
(ve, *-* , ¥g), that is, the rank of (7.2), is r is given by 


» 
(7.17) Il a+), 

f=r+1 
where @,4: , -** , ¢p are the p — r smallest roots of (7.10). 

An interesting example of the g-sample problem has been discussed by Coch- 
ran [6]. Let 2‘ be a measurement on the ath replicate under the kth treatment, 
measured on scale 7. Among other questions concerning comparisons of the 
scales, there is the problem of whether the scales are linearly related. This is 
exactly the problem considered above. 
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ON CERTAIN METHODS OF ESTIMATING THE LINEAR 
STRUCTURAL RELATION! 


By J. NEYMAN AND ELIzABETH L. Scorr 
University of California, Berkeley 


1. Introduction and summary. The first part of this paper considers two 
methods of estimating the linear structural relation between two variables 
both of which are subject to “error”; the second part of the paper comments 
on a recently advanced procedure for constructing the confidence region for the 
slope of the structural relation. 

In 1940 Wald [1] initiated a certain procedure for estimating the linear struc- 
tural relation between two variables both of which are subject to “error.” 
Wald’s idea was extended by Nair and Banerjee [2] and later by Bartlett [3]. 
These procedures require some knowledge about the values of certain non- 
observable variables. When this knowledge is not available there is a temptation 
to substitute information derived from observations. One such method was 
considered by Wald who found sufficient conditions for the consistency of the 
resulting estimate. The purpose of the first part of the present paper is to find 
the necessary and sufficient conditions for two procedures with reference to a 
slightly more general case, namely, when the “errors” in the two observable 
variables may be correlated. The results obtained indicate that the two pro- 
cedures, applied in the case of no additional knowledge about the values of the 
non-observable variables, will lead to consistent estimates of the slope of the 
structural relation in very exceptional cases only. 

In 1949 Hemelrijk [4] described a novel procedure for constructing the con- 
fidence region of the slope of the linear structural relation in the case when the 
non-observable variables have unknown fixed values and the observations are 
made with “error” which has the same probability distribution at each point. 
The present paper considers this same procedure when there is no information 
about the fixed non-observable variables and also when these variables are ran- 
dom variables, and shows that the probability that the confidence region covers 
the true slope is the same as before but that the probability of covering any 
other slope is now the same as this probability of covering the true slope. 


2. Statement of the first problem. Let £, u, and v denote random variables 
with H(é) finite and with E(u) = E(v) = 0, and let & be independent of the pair 
u, v. In Method 2 below, assume also that £, u, and v have finite variances. 

The three variables £, u, and v are assumed to be nonobservable. However, it 
is possible to observe the random variables x and y defined by 


! Prepared with the partial support of the Office of Naval Research, and presented at the 
Chicago meeting of the Institute of Mathematical Statistics, December 29, 1950. 
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x € -+ 4, 
y=at BE+?, 


(1) 


where a and 8 are unknown constants. The relation 7 = a + £¢ is called the 
linear structural relation between the random variables x and y. The variables 
u and v are called the components of error although only part or, even, none of 
them need represent “error’’ in the strict interpretation. 

We consider that n pairs of observations, say z;, y;, fori = 1, 2, --- ,n, 
will be made on z and y. It will be assumed that the triplet (&; , u; , v;) corre- 
sponding to the ith pair of observations is completely independent of all other 
such triplets. This will imply the independence of the pairs (x; , y;) and (x; , yj), 
i # j. For the first part of the paper (Sections 2-5) we consider that after 
the observations are obtained, they will be renumbered according to the mag- 
nitude of x so that 2; <= 2xi4, fort? = 1, 2,--- , nm — 1. However, this renum- 
bering will not be assumed in Section 6. 

Two different procedures for estimating 8 are considered. 

Metuop 1. Fix two numbers a S b such that P{x <= a} > Oand P{x > b} > 
0. Let Z; , W; denote the arithmetic mean of the z,’s and y;’s, respectively, for 
those pairs of observations for which 2; S a, and Zs , W2 for those pairs for which 
x; > b. As an estimate of 8, consider, say, bh} = (W2 — W,)/(Z. — Z,). 

Meruop 2. Fix two proportions, p; > 0 and p, > 0, such that p; + po S | 
and then let r = [np,] and s = [np2]. Denote by Z; , W; the arithmetic mean of 
the x,’s and y,’s, respectively, for which 7 = 1, 2, --- ,r; and by Z,, W, the 
corresponding mean fori = n — s + 1,n — s + 2,--- ,n. The estimate of 
8 is then, say, bs = (Wy — W3)/(Zs — Zs). 

Both of these methods are tempting in practical applications involving the 
estimation of the linear structural relation between two variables both of which 
are observed with “error”. The purpose of the first part of the present paper is 
to investigate the necessary and sufficient conditions for the consistency of the 
estimate of 8 in these two procedures. 


3. Necessary and sufficient conditions for the consistency of the estimate },. 
We wish to compute the stochastic limit of the estimate 6, . In order to do this, 
we shall use a slight generalization of the well known theorem of Khintchine 
(see page 253 in [5]). The proof used for this lemma follows directly from an 
unpublished result of Robert F. Tate. 

THEOREM 1. (GENERALIZED THEOREM OF KHINTCHINE.) /f | X;} is an infinite 
sequence of random variables, all independent and having the same distribution with 
E(X;) = =; further, if {v,} is an infinite sequence of integer-valued and positive 
random variables tending in probability to infinity (that is, such that, for any M, 
limps P{v, <M} = 0), thenasn — ~ the arithmetic mean of a random number 
vn of variables X, , X2,--- , X,, converges in probability to = 


OO 
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lim p : >» Xi = 
n—©o Vr i=l 
Each of the two terms in the numerator of 6; is a mean of a random number, 
say v, , of random variables all having the same distribution function with finite 
expected value. The same remark applies to the denominator of bh; . Asn — ~, 
the variable y, tends in probability to infinity. It follows that, at the same time, 
each of the four means converges in probability to its expectation. Now, using 
the theorem of Slutsky (see page 255 in [5]), we see that the stochastic limit of 
b; is equal to E(W, — W,)/E(Z. — Z,), provided this is finite. Thus, in the fol- 
lowing we shall be concerned with the conditions under which E(W: — W,) = 
BE(Z, — 7). 
We consider first the expected values, 
E(Z) = E(x|x Sa) = E(fE+ uli+u Sa), 


E(Z.) = E(x|\x>b) = E(é+ult+u> bd). 


In expression (1), we may set 2 = 0 without loss of generality since we consider 
only differences, W, — Wi, etc. We then have 
E(W,) = E(y|x Sa) = BE(E|E+ usa) + EVlE+u sa) 


and similarly 
E(W2) = BE(E|E+u>b)/+ EWlE+u> Dd). 
Thus, 


E(W: — Wi) 
(2) 


= BE(Z, — Z,) — E(pu-—v|t+u>b)+ EGu-—vli+usa), 


Let f(u) denote the expected value of v given u fixed. Then the expectation of 
v may be rewritten in terms of f(u). Thus, for example, 


E(v|é+u> b) = E{Ev|(—E+u> db), ul} = Efwl|e+u> d}. 
Now 
E(W. — W,) = BE(Z: — 2;) 
— Elgu — flu) |&+u> 6) + Epu — flu) |§&+u sal. 


(3) 


It is seen that the necessary and sufficient condition that b; be a consistent esti- 
mate of 8 is that, say, 


(4) I = Elpu — fu) |&é+u sal — EHpu — flu) |E+t+u> dj = 0. 


Since the value of 8 is unknown, it is of interest to ask for conditions which 
will preserve the consistency of the estimate of 8 no matter what the value of 
Bp, —2~ <B < «~, may be. Let Jp be the value of J when 8 = & and J, be the 
value of J when 6 = 8 + 1. Subtracting J) from J, , we find that, say, 
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(5) J =EKu\jti+usa)— Etulté+u>bd) =0 


is a necessary condition for the consistency of the estimate 6, irrespective of 
the value of 8. We shall see that J = 0 is also a sufficient condition. 

Let &(£), G(u), and H(v) denote the distribution function of ~, u, and v, re- 
spectively, and let F(x) denote the resulting distribution function of x. Now we 
may write 


E(u\&+ usa) [F(a)}* lf ud &(t) dG(u) 


&+usa 


+00 
(F(a)|7 [ te 0 00 


+00 
= [F(@J" [ hte = ah — Bal aed, 


since E(u) = 0. Similarly, 
Eujé+u>b) =f1- FOr [ ulo(b) — (6 — w)] dG(w). 


Thus, in expression (5), 


_ f™@ [ea — u) — O(a) , o(b — uw) —- +0) sk 
J = [ uf F(a) + i — Fo) dG(u). 


It is easy to see that, unless both terms in the expression in square brackets are 
zero, the integrand is always negative. Thus, the necessary and sufficient con- 
ditions for J = 0 are 


(a — u) — P(a) = 0, 
$(b — u) — &(b) = 0 


(6) 


for all values of u except for a set of probability zero. 
Let (u, v) denote the shortest interval such that P{u S u S v} = 1. We know 
that uw» S 0 S vsince E(u) = 0. Then conditions (6) imply 


P(a — v) = O(a — pw), 
(7) 
&(b — v) = O(b — yp); 


Pla-v<tsSa-— yz} 0, 
(8) 
Pibh—-v»v <tSb— yp} =0. 


Equations (8) are the necessary and sufficient conditions for J = 0 and, there- 
fore, the necessary conditions for the consistency of the estimate b; . We shall 
now prove that they are also sufficient. In order to do so, we consider 


BJ —I = Elf(u)|t+usal — Hfu)l|t+u>bd) 
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and show that when conditions (8) are satisfied then BJ — J = 0. Under con- 
ditions (8) we have 


BJ —I = Eff(u)|& Ss a — »| — Elf(u) |e > 6 — ul), 


and, since the pair of random variables u, v is independent of the random vari- 
able ¢ and since E(v) = 0, we have BJ — J = 0. Hence the conditions (8) are the 
necessary and sufficient conditions for J = 0 irrespective of the value of 8. 
We now have proved the following theorem: 

THEOREM 2. In order that b; preserve the property of being a consistent estimate 
of 8 irrespective of the value of 8B, —x~ < 8B < @, it is necessary and sufficient 
thatPia-—v<tsa-—ps} =Pilb—v<isb— yp} = 0. 


= | 


4. Necessary and sufficient conditions for the consistency of the estimate b.. 
We now compute the stochastic limit of the estimate b. . Since each average in the 
expression for b. is taken over dependent observations, the theorem of Khint- 
chine is not directly applicable. We shall evaluate the expectation and variance 
of each average, shall show that each variance tends to zero and thus that each 
average converges in probability to its expected value. 

Letting x; denote the jth of the observed z,’s, i = 1, 2,--- , nm, numbered, 
as above, in order of magnitude, we have 


(9) E(z;) = nC, / x{F(x)) [1 — F(a)|"~ dF (2). 


Then 


RZ) =" > ct | z{F(2))" [1 — F(x))"" 


8 jen—s+1 


(10) 


ere 


| XI Piz) (n — 8; 8) dF (zx), 


n 


s 


where I p(2)(n — 8, 8) is the incomplete Beta function, 


a 


F(z) 
[ uo" (1 — dt 
0 


I, z) (n == §g. s) = 


1 
[ (1 — t*" de 
0 


Let X.i_,, denote the (1 — p2)-percentile point of the distribution of x. As is 
well known, when n — « with s = [np.]| where pz is fixed, then I»;2)(n — s, 8) 
tends to zero for all « < X,_,, and to unity elsewhere. Thus 

1 Te 


(11) lim E(Z,;) = — z dF(z) = E(x|x> X1-_,,). 

n—0 P2 Xi-p, : 

We now need to show that ¢7, > 0 as n — © with s = [np] where pp» is fixed. 
Consider 


n—l 


(12) 8 Z. P42 D> DS mcm). 


imn—etl jerit+l 
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Repeating the reasoning leading to (10), we find 


n : +00 . 
B( bm “) = nf x Ipc)(n — 8, 8) dF(x). 
jun—etl oD 


Further, for 7 > 7, 


E(x;2j) = n(n — 1)Cx te a[F(x))*™ 
/ 2[F(2) — F(a) [1 — F(2)]"~’ dF dF(2). 
Thus, 


2x DY kuz) 


i=n—s+1] jmi+l1 
+<c © n—l 
= 2n(n — 1) / al z > Co [F(x)|** (1 — F(x)]""** dF(z) dF (x) 
— oo z i=n—s+l 
= 2n(n — 1) [ tl piy(n — 8,8 — » | z dF(z) dF(z). 
Substituting into (12), we have 


+00 

4 / ry? n 2 ’ 

E(Z4) = al zt Tp ¢2)(n =. 6, 8) df(x) 
= oo 


9 n(n — eT : . . 
. so - ; XI pz) (n — 8s — 1) z dk (z) dF (x). 


= [npe| where pz is fixed, we obtain 


Letting n — ~ withs = 


9) * o 
lim Ei) =f 2 | 2aF@) dF) 
n—-2 P2 Xi—pe z 


1 eo a 

= [ | z dF (z) dF (x) 
P2 X1—pe Xi—p. 
[lim E(Z,)}’. 


n--o 


Therefore, 


lim oz, = (), 
n--o 
and it follows that as n is increased Z, converges in probability to E(x | z> 
Xi-»,). Similarly, it can be shown that Z; converges in probability to 
E(x|z S Xz,). 


We now want to compute E(W,). Let y; be the observation accompanying 
z;, the jth of the 2’s in order of magnitude. Then 


E(y;) = ELE (y;| x;)), 


and we have 
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E(y;) = [ E(y | x) nC7-,|F(x))" [1 — F(2)|""’ dF(a), 


so that 


n 


EW) =" [ Eyl2) dS CAR@)U — F@I aro) 


jun—s+l 
oe 
“4 [ Ely x) I p2)(n ~ ¢ 8) dF (z). 
8 0 
Owing to the property of Ir,2)(n — s, s) already mentioned, we thus have that, 
asn— » with s = [np»| where pe is fixed, 


x 


lim E(W,) = L 


n+ Pe Xi—pe 


E(y | x) dF(x) = E(y|x > Xi_,,). 


Combining the reasoning above with that used to obtain E(Z4), it is easy to 
show that 


lim ow, = 0. 

ne 
It follows that, as 7 is increased, W,; converges in probability to E(y | x > X4_,,). 
In a similar way, we can show that W; converges in probability to E(y|z < 
ins) 

Now, noticing that the stochastic limits of W;, Ws, Z;, and Z; are identical 
with the expectations of W, , W., Z,, and Z., respectively, we can use the re- 
sults obtained in Section 3 to establish Theorem 3. 

Let r = [np], s = [npo] and let &,, and &_,, be the corresponding percentile 
points of é, that is, such that P{é S &,,} = piand P{t > é_»,} = pe. 

THEOREM 3. If n — ~ while p; and pz are held constant, the necessary and suf- 
ficient condition that be preserve the property of being a consistent estimate of 8 
irrespective of the value of B, —-~» < B < ~, 7s that 


{t, —vy <t St, —u} = Pli,—-v<ts& Pe Ke} = (). 


5. Remarks. A. Wald [1] considered estimates similar to b. for the case u and 
v uncorrelated; with p; = p. = 3, and showed that the conditions in Theorem 3 
are sufficient conditions. 

It is interesting that b; and b. may be consistent estimates of 8 for some values 
of this parameter even though the conditions of the theorems are not satisfied. 
To see this, we return to formula (4). If wand v are dependent random variables 
and if the regression of v on u is represented by the equation f(u) = 8*u, then 
b; and be are consistent estimates whenever 8 = £6*. On the other hand, if u 
and v are independent, then the terms in (4) involving f(u) drop out because 
E(v) = 0, and we find that whenever 8 # 0 the necessary and sufficient conditions 
for the consistency of the estimates b; and be are just the conditions in Theorems 
2 and 3, respectively. However, if 8 = 0 then b,; and lb» are certainly consistent 
estimates of 8. 
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The results obtained suggest that the two procedures discussed will lead to 
consistent estimnates of 6 in very exceptional cases only. 


6. On Hemelrijk’s confidence region for §. Hemelrijk [4] considers the follow- 
ing construction of a confidence region, say B, for the slope 8 of the linear struc- 
tural relation. Let (x, , y,) and (2, , ys.) be two different points chosen from the 
n observable points in any manner which is completely independent of the 
u;,v;fori = 1,2, --- ,n. Consider the set B of values of the slope of two parallel 
straight lines, one through (z,, y,) and the other through (x, , y.), such that 
inside of the closed strip bounded by these two lines there are less than n — m 
observed points. Hemelrijk shows, under the conditions stated at the beginning 
of Section 2, with the additional assumptions that (a) whatever the fixed num- 
bers @ and p, the probability is zero that the errors u and v will satisfy the re- 
lation u cos @ + v sin @ = p and (6) the &;, for? = 1, 2,--- , n, are unknown 


» ©» 


fixed numbers, that the probability that the set B includes 8 is given by 
(15) P{pée,B} =1 for0 S$ msn —3B. 


It should be emphasized that unless some additional information, not as- 
sumed here, is available about the unobservable variables then, in order to 
fulfill the condition that the choice of the two points used to construct B be 
made in a manner which is completely independent of the (u;, v;) for 7 = 1, 
2,---, n, this choice ordinarily will be made at random out of the n observed 
points. Also in many practical situations it does not seem appropriate to con- 
sider the values of £ as fixed constants. Rather, they are treated as independent 
samples of a random variable. 

We now show that in either of these two cases, (i) when the values of & are 
fixed constants but the choice of the observed points to be designated r and « 
is made in a random manner, and (ii) when é is a random variable, the prob- 
ability that the set B includes any fixed slope, say vy, is exactly the same as the 
probability that B includes the true slope 8, as given by (15). The theorem is 
stated for the second case but the proof is identical in the two cases and is the 
same as that used by Hemelrijk to prove (15). 

THeoreM 4. Whatever the fixed number y (whether coinciding with the slope 
B of the structural relation or not) under the conditions given at the beginning of 
Section 2 plus the condition (a) above, the probability that the set B includes y is 


(16) Ptye BY = 1 — M+ Vim + 2) 
n(n — 1) 


withO <= msn — 3. 


Proor. The set B includes y if and only if the parallel lines of slope y through 
the two points (z,, y,) and (a,, y,) determine a closed strip which contains 
fewer than n — m points (xz; , y;). Let 2; denote the distance, in an arbitrary fixed 
direction different from +, from (x; , y;) to any fixed line L of slope y. y ¢ B if 
and only if fewer than n — m of the 2, z2, °°: , 2n lie in the closed interval 
[z,, 2,]. Under the assumptions made, 2, z2,°--, 2, are independent obser- 
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vations of the same random variable z (under the conditions of Hemelrijk, this 
is true only when y coincides with 8), with probability one that the z;’s are all 
different. Thus, the probability that z, is the jth smallest of the z,’s is the same 
1/n for every j. The same is true for z, . The z,’s may be arranged in n! ways. The 
number of arrangements for which fewer than n — m of the z,’s lie in the closed 
interval [z,, z.] is n! — 2[(m + 1)(n — 2)! + m(n — 2)! 4+--- 4+(n 2)! 
so that the desired probability is 


_ (m+ 1)(m + 2) 


> »B) = 
rive By : n(n — 1) 


with O < m Sn — 3. 

The theorem just proved implies that, under the conditions stated, the power 
of the test of the hypothesis that 8 = §) , say, provided by the set B, is a con- 
stant equal to the probability of an error of the first kind. 

The authors wish to emphasize that it is not their intention to criticize the 
elegant construction of Hemelrijk, which is perfectly correct in relation to the 
hypotheses he makes. The point under discussion is that, just as in the case of 
the result of Wald, one may feel tempted to apply Hemelrijk’s procedure some- 
what beyond the limits indicated. The results obtained here show that such ex- 
tensions are not profitable. Since the distinction between the conditions as- 
sumed by Hemelrijk and those at the outset of this paper is somewhat delicate, 
some illustrations may be interesting. 

(i) Consider that N astronomers propose to study the slope 8 of the structural 
relation between two characteristics, — and » = a + 8, of the stars. Each as- 
tronomer will observe, independently from the others, the same n stars and will 
use his set of n pairs of observations, {x;, yi}, 7 = 1, 2, --- , n, to construct the 
confidence set B for the slope 8. Furthermore, for the construction of B each will 
designate the observations from the same two stars chosen from the n stars in 
advance, say Castor and Pollux, as (z,, y,) and (x, , ys), respectively, and will 
use the same value for m. We have here the conditions assumed by Hemelrijk. 
Expression (15) holds but not necessarily expression (16) for y # 8. Thus, the 
expected proportion of the N sets B which include the true slope 6 is 
1 — (m + 1)(m + 2)[n(m — 1)]", but this is not true, in general, for any other 
slope y ¥ 8. 

(ii) Consider a situation similar to that described in (i) except that each of 
the N astronomers chooses for himself, and in a random manner, the particular 
two stars, out of the n stars, to be designated as the rth and the sth in the 
construction of his set B. Now, whatever the number y, whether coinciding with 
the true slope 8 or not, the expected proportion of the sets B which include y 
is the same number, given by (16). 

The same conclusion holds if this situation (ii) is altered by having each 
astronomer choose for himself the particular n stars that he will observe. How- 
ever, if we consider the subset of the N astronomers, each choosing for himself 
the n stars that he will observe, who use the same two stars, say Castor and 
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Pollux, to construct the set B, then (16) does not necessarily hold for y # 8; 
we have the same conclusion as with situation (i). 

(iii) The situations above may be considered somewhat unrealistic. Pre- 
sumably, the N astronomers would not each construct his own confidence set B 
for the same slope 8. Rather, their observations would be combined and then 
one set B constructed from the combined observations. We may, however, 
consider the cases in general human experience in which, each for its own prob- 
lem, a set B will be constructed. The expected proportion of these sets B which 
include any number y is exactly the same as the expected proportion which 
include the true slope, as given by Theorem 4. 
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SOME TESTS BASED ON DICHOTOMIZATION 


By Nits BLomQvIsT 
University of Stockholm and Boston University 


1. Summary. Some methods for testing independence between the components 
of a random vector are discussed. The basic principle in the construction of the 
tests is dichotomization of each component variable. The distributions are 
obtained under randomization. Other applications of the tests are mentioned 
(Section 2). Certain limiting distributions are derived (Section 4). The exact 
distribution of the test statistic in a special case is tabulated (Section 5). A 
brief study of an alternative test is made (Section 6). 


2. Introduction. Consider a random sample of n vectors from an m-dimensional 
population with unknown distribution. It is desired to have a nonparametric 
test of independence between the m random variables. Solutions applicable to 
this problem were given by E. J. G. Pitman [1], who studied the conditional 
distribution of a certain statistic under permutations of the actual observa- 
tions, and by M. Friedman [2] and M. G. Kendall and B. Babington Smith [3] 
using the method of ranks. At least in the latter method the absence of ties is 
essential, so that the observations of each component variable can be ordered. 
The present paper deals with the opposite situation, where the n observations 
of each variable are so heavily tied that it is possible to distinguish only between 
two groups, one higher and one lower, say. The situation can also be described 
as a dichotomization of distinct observations in order to simplify calculations, 
in situations where such simplifications (and loss of efficiency) can be afforded. 

If all observations belonging to a higher group are replaced by scores one and 
the others by scores zero, and if the numbers of score one are , , m , -*- Nm (O < 
ni; <n,t = 1,2,--- m), respectively, then the observations may after the dichot- 
omization be represented by the following matrix 


totals 


1 
Y2 


totals Te +c: Re 


where each x;; is equal to either one or zero. 


it 


“° 


Since the sample is assumed to be random, all ( 


) different assignments of 


Nj 
scores in the jth (7 = 1, 2, --- , m) column of (1) have the same probability. 
Denote the common expectation of the y’s by P = p, + P2 + +++ + Dm, Where 


362 
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ny = np; (9 = 1, 2, +--+ , m). Under the (null) hypothesis of independence we 
expect all y’s to be in the neighborhood of P. It seems, therefore, appropriate 
to base a test of independence upon the deviations from these expected values. 
Accordingly, we define 
S = p (ys oF Pp)’ 
i=l 
us a test function and consider large values significant. 

It should be observed that the test based on S can be applied also in situa- 
tions other than the one considered above. First, when independence between 
the column vectors is assumed, we make the null hypothesis that all assignments 
of scores in a column are equally probable. In such a case we are dealing with 
tests of homogeneity between the rows of matrix (1). This situation has been 
considered by W. G. Cochran [4], whose statistic Q is the same as the one in 
Theorem 4 of the present paper. Cochran gave a nonrigorous proof of this the- 
orem. For the sake of completeness the rigorous proof is given here. Secondly, 
let the mn values be observations in a two-way classification with one observa- 
tion in each cell and assume that there are no column effects, in the sense of the 
analysis of variance. After choosing appropriate p-values the test S can be used 
to test the absence of row effects. This problem was studied by A. M. Mood and 
G. W. Brown ([5], p. 399) in the important special case when all p-values equal 
4. that is, when each column is dichotomized by the median. Theorems 3 and 
{ in this paper are generalizations of results already given in [5], p. 399. 

The p-values are considered nonrandom. This seems to be a proper assumption 
in the continuous case, since it is always possible to have the columns in (1) 
dichotomized by fixed quantiles. In the discrete case the test will be conditioned 
by this assumption. 

No attempts have been made in this paper to investigate the power of tests 
considered. Consequently, all statements refer to the situation when the null 
hypothesis holds true. 


3. Basic covariances. Before entering into discussion of the tests based on S 
we will introduce some notation and give some results for the case m = 2. 
Let 


nm 
tik = i Li5h ik 
i=l 


be the number of rows in the matrix (1) having score one both in the jth and 
in the kth column and define 


i< t; 
(2) = - (x4 = Di) + (in —_ Pr) = 2 PiPk 
Tl j=l n 


as the basic covariance between the two columns. 
In a previous paper [6] the author has studied a test of independence between 
two random variables, based on the statistic qi. , for the special case p; = po = 


} Some of these results are generalized here. 
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Dichotomization of the jth and kth column is equivalent to constructing the 
following 2 X 2 table: 


Number of scores 


Column k 


Column j 


n—-n—mtv 
nj — v 


Totals n— 


From this table the exact distribution of any ¢ and, consequently, qj is ob- 
tained: 


(3) Plt. = »| = ("’) oe 


max {0,n; +m — n} < »v < min {n;, ny}. 


where 


The first two moments of the distribution of qj, are obtained from (3): 

(4) " Vik) 0, 
o (qx) = pypx(1 — ps) — pr)/(n — 1). 

From formula (3) also some asymptotic forms of the distribution of qj, (and 
tj) can be derived. The derivations are straightforward applications of Stirling’s 
formula, wherefore we will give only the final results here. 

THEOREM 1. If p; and p, remain fixed as n — ~, then qj has in the limit a 
normal distribution with mean and standard deviation as given in (4). 

Tueorem 2. If +/np; — d; and ~/np. > \& as n — ©, then ty, has in the limit 
a Poisson distribution with parameter djrx . 


4. Limiting distributions of S. We shall proceed to study the asymptotic 
behavior of the S-distribution under various assumptions regarding m, n and 
the p-values. From a practical point of view the case of large n should be most 
important when we are dealing with tests of independence between the columns 
in (1). In the case of testing for homogeneity between rows, however, large 
m-values become of main interest. Accordingly, we shall investigate both these 
cases and also the case when the p-values are small and n large. 

THEoreEM 3. If m and all p-values remain fixed asn — ~ , then S is asymptotically 
normally distributed with mean 


m 


n Dd pi(l — py) 


jam] 
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2d, pipe(l — p)(l — po). 
Proor. It follows from the definition of S and qj that 


s o{z (xy — p)} 


i=l ?=1 


(5) 


n pi pi(1 — pj) + 2n Zz Qik ; 
j=1 i<k 


which proves that S essentially depends only upon the sum of the basic covari- 
ances. Furthermore, 
as iVn — —1 
L(g, -** Qmt.m) = 2 Wpypa(l — py)(1 — px): u 
i<h V papel — pj)(1 — pr) 
Vn—1. DL ae 
i<k 
is a fixed (in m) linear form in the random variables 
‘ Gan — 1 
V pspe( — ps)(1 — pr) 

For large n these variables, according to Theorem 1, tend in probability to 
standardized normal variables. The linear form L tends to the same linear form 
in the limiting variables. Hence L is in the limit normally distributed. It is 
readily seen that the variables gq; are pairwise independent, from which it 
follows that also the limiting variables are pairwise independent. Consequently, 


E(L) = 
o(L) = >> pyp(l — pl — pm) 
i<k 


also in the limit. The theorem follows from (5). 
TueoreM 4. If n and all p-values remain fixed as m — =, then 


S _ 
~ p(l 7 Ps) 
71 
has in the limit a x’-distribution with n — 1 degrees of freedom, subject to the con- 
dition that 


(j,k = 1,2, --- mj # bk). 


r is 
lim — 2, p,(1 — p;) ¥ 0. 
m—+o i=l 

Proor. The random vector (y, — P, y. — P, --- , yx — P) is a sum of m 
independent vectors with zero mean vectors and variances and covariances 


usd = E(ais — pi)’ = pl — p,) (i = 1,2, ---,n), 
pil lh pi) (i, k= l, 2, rr n;% A k; 


« 


(i) 7 
; = E — or a ye SE * 
Mik (xi; Pi) (x; Pi) at j = 1,2,--+,m). 
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Since all these vectors are uniformly bounded with probability one, the Linde- 
berg condition for the generalized central limit theorem ({7], p. 113) is fulfilled. 
It follows that the vector 


| 
(yi — P, ye — P, +--+, Ye — P) 
Vm 


has a limiting n-dimensional normal distribution with zero mean vector and 
variances and covariances 


where 
L< 
A= lim — >> p(1 — p). 
m—-c0o Mm ?=1 


(From the assumptions made in Section 2 it follows that A + 0.) Hence the 
vector 


re = 


— m~- Fiam f+ *+st— FP) 
VV Amn 7 


has a limiting normal distribution with zero mean vector and variances and co- 
variances 

Bi = (n — 1)/n (g = 1,2, --- , 2), 
(6) a ; 

Biz = —I1/n (¢,k = 1,2, --- 2:4 x k). 

The covariance matrix constructed from (6) has n — 1 characteristic numbers 

equal to one and one characteristic number equal to zero. Consequently ([8], p. 
314), 


n— (y; — P)? = (n — 1)S 


Amn j=1 Amn 
has a limiting x’-distribution with n — 1 degrees of freedom, which proves the 
theorem. 

The next theorem concerns the case when all p-values tend to zero as n ap- 
proaches infinity. A practical example where this situation becomes of interest 
is the following. Suppose that a large number (n) of persons are given some 
(m) psychological tests and we want to investigate whether or not these tests 
are independent of each other. For each person (7) and test (j), passing (2,;; = 1) 
or not passing (x;; = 0) is registered. If then the number (n,;) of persons passing 
the jth test (j = 1, 2, --- , m) is small as compared with n, it seems preferable 
to use the approach of the following theorem to that of Theorem 4. 

Since now most of the y-values in the sum column of matrix (1) will equal zero 
or one, it seems intuitively desirable to transform the test function S in such a 
way that the few y-values exceeding one are separated. Putting 


7" = Zz tix, 


i<k 
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we may write the expansion (5) 
S = nP(1 — P) + 27. 


T depends only upon those y-values that exceed one, which follows from the 
fact that, if the number of y’s equal to 7 is denoted by r; , then 
" oe. eit 
(7) T = > (3) Ti, 
a formula that can be used in computing 7’. We proceed to prove the following 
theorem on the limiting distribution of 7, when the p-values tend to zero as 
—1/2 
n ° 
Tueorem 5. If m remains fixed and ~/n pj d; (j = 1, 2,--- ,m) asn— 
2, then T has in the limit a Poisson distribution with parameter >> 5 <x dj 


k—1 


Proor. Let T; = >, tx. Then 


j=l 
(8) T=T:+T:+---+Tn. 


The main part of the proof of the theorem will be to show that, if we add an 
extra column to the matrix (1), then 7’,,4: is in the limit Poisson distributed and 
independent of 7’. 


As above, let r;(i = 0, 1, --- , m) be the number of y’s equal to 7. It is easily 
seen that 7’, ; depends upon 7 only through the r-values. Since the columns 
of matrix (1) are independent, the conditional distribution of 7',.4;, given 
fo, fis*** fai 


—l m 
(9) P{T m+i| To, Ti, oe - eb ) 11 ("), 
Nm+t imo \Ti 


where the summation is extended over all z’s such that 


m™m 
2 Li = Nm+1, 


i=0 


(10) > iz; = Tan, 


1=0 


0O< a <71, (¢ = 0,1, ---,m). 


We now let n — « and nj/V/n —> Aj(j = 1, 2, --- , m) while mand Tay 
remain fixed. Because of (10) x2; , x2, --++ , 2m are bounded and ap = Nay: — 
ee x; is of the order +/n. 

Because of (7), rs , 73, *-* , Tm are bounded. Furthermore, since 


m 
) T; n, 


1=0 


m m 
i ir; = _ Nj, 


i=0 7=1 
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it is true that rp is of the order n and r, of the order +/n. Applying Stirling’s 
formula to the summand in (9) we obtain after some calculations 


Tm+1 
A 


Scull de , 
—l im ?<. for y= tase % = 2%3=-> 
n rj 1 
lim ( ) [I ( Na 3 Pmt! . 
i= 


Nm+1 / Zi/ 


n--2 


0 otherwise, 


where 


A= Amel a A;. 


Hence, according to (9), 


i a" AT™+ 
lim P{Tmii| ro, tT, <2 iy Peal = "he 
n—-0 l- 43 5 
Since the r-values completely determine 7’, it follows that 7,,; is in the limit 
independent of 7 and Poisson distributed with parameter A. Applying Theorem 


TABLE I 
P {S = So} 


-_ —_ 


& 


1.000 1.000 = re - 

.960 .973 1.000 1.000 1.000 
.651 .745 , .583 .761 .840 
.549 .663 : .361 .576 .696 
.302 .435 ¢ .0278 .236 .399 
.225 .355 . 109 . 237 
okie . 237 . .0471 .147 
.102 .213 -0162 .0896 
.0400 .109 .0410 
-0315 .0971 | -0290 

.0563 -0008 .0110 

.0379 .0060 

.0299 | 2¢ .0024 

.0239 | .0006 

.0078 

.0030 

.0025 

.0017 

.0013 

-0005 

.0001 
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TABLE I—Continued 


n=6 


— SS | 
Ke onuouwre on ow = 
oor cr Gr Or Cr Or 


qo tn on tn on or 


-0008 
-0003 | 


i nNnNw Ne 
| or & 


| © 
or 


TABLE I—Continued 


n=12 


m 


% 
o 


So ~~ 
1.000 


: 


600 | 


NaAanananaaan 


2B 
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2 for the case m = 2, we now proceed step by step to obtain the desired result 
that 7 is in the limit Poisson distributed with parameter 


m k—-1 
2k > Ms = >, AjAn- 
k=? i<k 


l=1 


This completes the proof of the theorem. 


5. The exact distribution of S in a special case. In the important special case 
when all p-values are equal to } it follows from Theorem 3 and 4 that S 7s asymp 
totically normally distributed with mean mn/4 and variance n?-m(m — 1)/8(n — 1) 
asn— «,and that 4(n — 1)S/mnis asymptotically x°-distributed with n — 1 degrees 


of freedom as m —» ~. These limiting distributions may be used as approxima- 


TABLE II 
Comparison between exact and approximate distribution of S at the 5% point 


P{S = So} 


¥ 
So 


exact x?-approx. normal approx 


4 
6 
8 
10 
12 
14 
16 


22 -038 -O44 017 
17. -036 -042 -017 
16 .048 -050 .029 
20 -025 .038 -O14 
17 -017 -039 -013 
17.! 043 -063 -037 
20 .031 -050 -025 


or CO 


wow >» > 


ow 


tions when n or m is large. For small values of n and m the exact distribution of 
S is needed. This is given in Table I for the following cases: 


n 4 6 8 10 12 14 #16 


m 38 35 34 34 3 3 38. 

The case n = 2 needs no consideration here since the test then reduces to the 
standard sign test. The case m = 2 is also excluded since it has already been 
tabulated in [6]. 

In Table II some comparisons are made between the exact distribution given 
in Table I and the approximations given in the paragraph above. The normal 
approximation has been applied after usual correction for continuity. For each 
pair (n, m) in Table II the best approximation has been underlined, which might 
serve as a guide in the choice of appropriate approximation. 


6. Another test. Although it has not been mentioned explicitly against what 
alternative hypothesis the S-test is designed, it is clear that we have had in 
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mind the case when all m component variables of the random vector studied 
are positively correlated. We do not intend to enter into a detailed discussion 
of the difficult question of alternatives. However, one case more shall be briefly 
mentioned. If about half of the variables are positively correlated with each 
other but negatively correlated with the rest of the variables, it is intuitively 
seen that the S-test will lose its power. Instead, a test should be used that is not 
based upon the algebraic sum of the basic covariances (5), but takes into account 
their absolute values. For example, a test based on the sum of the squares of 
the basic covariances might serve our purpose. In this connection we shall prove 
the following limiting theorem. 
THEOREM 6. If m and all p-values remain fixed as n — ~, then 
(n — 1) gis 


7 ppl = p= Po 
*) degrees of freedom. 
Proor. In the proof of Theorem 3 it was stated that the random variables 
qnVn — 1 
Vpipa(l — pj) — pr) 
are pairwise independent and, as n — ©, normally distributed with zero mean 


and unit standard deviation. Hence, in the limit they are totally independent. 
The theorem follows. 


has in the limit a x’-distribution with ( 


(j,k = 1,2, ---,m;j # k) 


The author expresses his indebtedness to Professor Frederick Mosteller of 
Harvard University for suggesting the original problem and for many helpful 
discussions. Miss Elizabeth Shuhany of Boston University has kindly assisted 
in the construction of tables. 
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SEQUENTIALLY DETERMINED STATISTICALLY 
EQUIVALENT BLOCKS 


By D. A. S. FRASER 
University of Toronto 


1. Summary. In 1943 Wald [2] gave a method for constructing tolerance 
regions in the multivariate case. Tukey generalized Wald’s procedure in [4] 
and the results were interpreted for discontinuous distributions in [5] and [6]. 

This paper presents a further generalization of the method so that statistically 
equivalent blocks can be determined sequentially; the particular function used 
to cut off a block may depend on the shape or structure of previously selected 
blocks. The results are also interpreted for the case of discontinuous distribu- 
tions. 

Possible advantages of applying the method are discussed. 


I. Continuous CASE 


2. Introduction. The general consideration of statistically equivalent blocks 
has its origin in Wilks’ method [1] of forming tolerance regions by using order 
statistics. For any interval formed from the order statistics, the proportion of the 
population “covered”, referred to as the “coverage’”’, was considered as the 
value of a random variable. Wilks showed that the distribution of this ‘‘coverage”’ 
was independent of the particular continuous population sampled; in fact, it has 
a Beta distribution depending only on the sample size and the particular order 
statistics chosen to form the intyrval. 

The method was extended to multivariate populations by Wald and Tukey 
[2], [4], the latter being responsible for the term “statistically equivalent block” — 
the multivariate analogue of the interval between two adjacent order statistics. 
The coverages of these blocks, n + 1 of them for samples of n, have a very ele- 
mentary distribution closely related to that of the n order statistics of a sample 
from a uniform distribution [0, 1], and the coverage of any sum of blocks has a 
marginal Beta distribution. 

The method used in previous papers to form blocks is, essentially, to have 
a fixed sequence of functions which are used successively to cut off blocks from 
the space of the random variable being sampled. In this paper the fixed sequence 
is replaced by one having the choice of function at any point in the sequence 
depend on the observed values at the cuts of functions already used. More gen- 
erally the choice of function can be made randomly from a class of functions 
where both the probabilities and the class can depend on the functions already 
used and on their observed values. All the previous results still hold but the 
proof for the discontinuous case requires special treatment. A precise definition 
of the blocks is given in Section 4 and the general theorem in Section 5. 

Advantages of this generalization for the practitioner can be illustrated by 
the following examples. Consider a sample of 25 from a continuous bivariate 


Q79 
vtlea 
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distribution with the values plotted in Figure 1. Suppose a tolerance region is to 
be formed by deleting 12 blocks and a further requirement is made that the 
remaining region should be roughtly of a given shape, say square or octagonal. 
Corresponding to the example in [4] we shall consider the latter. 

The functions to be used to form the region will be the following: 


Y, 2, —Y, —%F+Y,2 —Y, -~TI— yy, —Z + y. 


Using the function y a block is formed by the method of [4], that is, the sample 
point yielding the largest value of y is selected and the first region consists of 
all points in the two-dimensional space having a larger value of y. Similarly 
form the second block using the function x: the method is the same as for the 
first block except that we consider only the n — 1 points remaining after deleting 


4 


lL 
Cra 


INI 


Fria. 1 


the one determining the first block, and only that part of the plane after remov- 
ing the first block. Form successively in this way blocks corresponding to the 
eight functions. 

At this point we deviate from the procedure of [4]. For further functions we 
select from the given eight functional forms according to the values of the first 
eight functions at their respective cuts. To obtain a roughly octagonal region 
we shall make a ninth cut parallel to the shortest of the eight sides of the residual 
region. However, some of these sides may have vanished completely, in which 
case we take cuts parallel to the missing sides, commencing with the first when 
ordered according to the number of the function which produced the cut. This 
is carried out in the example in Figure 1 until twelve blocks altogether have been 
removed. The region 7' remaining after removal of the twelve blocks will be used 
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as the tolerance region whose minimum coverage with a given confidence level 
can be calculated by the theorem in Section 5. 

Consider a second example in which we use the sample of 25 plotted again in 
Figure 2 but desire a circular tolerance region. The functions used are the follow- 
ing: 


y, 2, —y, —2, (x — a) + (y — BY. 


As before we remove four blocks using the first four functions, thus reducing the 
residual region to a rectangle. All further functions used will be identical to the 
fifth above where a, 8 are chosen to be the coordinates of the midpoint of the 
rectangle. Remove eight more blocks and use the residual 7' as a tolerance region 
with probabilities prescribed by the theorem in Section 5. Notice that tolerance 


allheeciitnekancchiaapeadiaaiais 
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regions so formed will be either circular or circular with two or four flat sides. 

These simple procedures and many possible variations should permit the 
practitioner to impose quite general but approximate requirements on the 
shape of the final tolerance region. 


3. Notation. Consider a probability distribution over a space $ which could 


be Euclidean of one or higher dimension, or more general. By this we mean 
there exists a nonnegative additive set function over the space with measure 
one for the whole space. Denote by w an arbitrary point in the space and let W, 
called the chance quantity, symbolize the existence of the probability measure 
defined above. The symbolic operations on a chance quantity are the obvious 
operations with the probability measure. For example, corresponding to a real- 
valued Borel function g(w) over the space $, we can use the symbol ¢(W) for 
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a chance quantity whose probability measure is defined by Pyw(S) = 
Pw(¢'(S)), where S is a Borel Set in R’. 

The expression “coverage of a set’? is to be interpreted as the probability 
measure of the set. If the set is a chance quantity, then its coverage is a real- 
valued chance quantity or random variable. 


4. Definition of the blocks. Consider n points in the space S = {w} and a 
family of functions ¢i(w), ge(w | ¢i), +++ , @m(W] er, -** » m1), each of which 
yields a random variable havi:.z a continuous distribution for all values of 
$1, $2, °** » Gi-1 except perhaps for a set having P-measure zero. 

In the theorem that follows in the next section these n points will be considered 
as a sample of n for the chance quantity W. 

DEFINITION 4.1. The set wy, We,+-*, Wa and the functions g, ¢2,°-:, 
Gm(m < n) define blocks as follows: 


Si = {w|¢i(w) > a}, 
where cx max ¢i(wi) = (Wig), which defines an i(1); 
Se = {w| gi(w) < a1, g2(w| ar) > as}, 
where a = max g2(w; | a1) = ge(Wic | a1) and i(2) # i(1), which defines an i(2); 
in general for\ < k<m, 
Si = {wl ei(w) < a1, +++, gea(wlar,--* , oes) 


< apn, ge(w| ar, --> , ei) > ag}, 


where a, = max ox(w;y| ar, --* , Oe1) = ge(Wiay | or, -** , Oe«), the maximum 
t 


being taken over all i except i(1), i(2), --- , i(k — 1) and i(k) being chosen from 
the set over which the maximum is taken. 
If m < n, then 


Smingr = {w] gi(w) < a1, +++, em(W] ar, +++ , amar) < am}. 
The functions have thus defined n + 1 blocks if there are n functions, and if 


fewer functions, then m blocks and an associated region Sy,jn4; . The definition 
of the blocks is unique unless 9;(w;) = ¢:(wx) for some 14, 7, k. 


5. General results. Continuous case. 

TaroreM 5.1. If o(W\o.,--+, gi) has a continuous distribution for all 
values of g:,+** , ¢i-1 (except perhaps for a set of P-measure zero) and for all i, 
and if for a sample of n, (W:,---, Wn), from the distribution of W we define 
blocks S,,--- , Smin4: according to Definition 4.1, then 

(i) the blocks are disjoint chance sets uniquely defined with probability one, and 

(ii) the distribution of the coverages 


Ci P(S,), 
CalaH = P(Smjn41) 
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isthesameasthat oft; , t2, --- ,tmand b a t; , where t; are uniformly distributed 
on the barycentric simplex with n + 1 vertices.’ 

(i) and (ii) could be replaced by the statement 

(iti) S,,--- , Sminga are a partial family of statistically equivalent blocks of 
type n + 1 and an associated m|n + 1 tolerance region. 

Proor. The proof using Wald’s principle and induction on m follows closely 
that given in Section 8 of [4]. 


II. Discontinuous CAsE 


6. Introduction. Scheffé and Tukey [3] considered tolerance regions for dis- 
continuous one dimensional distributions—previous results extended with in- 
equalities replacing the equalities. 

The multivariate discontinuous case was considered by Tukey [5]. As well 
as blocks, cuts must now be considered and this complicates the formation of 
tolerance regions. Some remarks on the main theorem in [5] are contained in [6]. 

The results of [5] and [6] carry over to the case where the functions used to 
form the blocks are decided upon “sequentially.” The proof, although similar, 
requires special treatment and some new devices. 

It is perhaps worth remarking that although the functions in [5] reduce all 
cuts to points, this is not necessary. A cut could be a line with perhaps two or 
more points on it. Select one by a chance procedure (each with the same probabil- 
ity) to represent the cut. The remaining points are then available to fix the 
cuts for other blocks. 


7. Definition of the cuts and blocks. The formation of the m-system of func- 
tions needs to be altered slightly to take care of the new procedure admitting 
a choice of function at any stage. 

As in [5] we order finite sequences (a, ,--- , @m), (b1,-°-- , bm) by means of 
the following rule. (a,,---,@n) > (b1,--+, bm) if any of the following hold: 


a, > by ‘ 
a, b; and a. > lh, 
a,;=b;(¢<m) and a, > bn. 


We define < similarly, and = means identity. 
DEFINITION 7.1. An m-system of functions $, ,--- , ®» is defined as follows: 


O(w) = {ora(w|h,--- , Bea), +++ 5 Gepa(w] Bi, --- , Pex), 


where gx,i(w|®,,--* , Bea) is a real-valued Borel function of w in the space § 
and is also dependent on %,, --- , ®y-1, where these are points in the Euclidean 


spaces R?, R?®,.-- , R°“-” and where p(k) is a positive integer depending 
on k. 


1 See Tukey [4]. 
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We can order values of a function 4 by the lexicographical method, described 
above, for ordering sequences. 
DEFINITION 7.2. Given an m-system of functions and n points wi, +-* , Wn in 
S$ (m < n), the corresponding blocks and cuts are defined by the following procedure: 
Select i(1) to maximize ,(w;). If more than one value of i maximizes , , choose 
one at random (each taken to be equally likely). Let a, be this maximum value of 
,(w,). 
{w | d,(w) > (wig) }, 
{w | &i(w) = 2 (wia))}. 


Next, i(2) is selected #i(1) to maximize #.(w; | a), using the chance procedure 
as for i(1) in case of ties. Let the maximum value be az . 


Se = iw | ;(w) < ay ’ &.(w | a) > $2(wi2)) 1 @)}, 
T. = {w|%(w) < a, B(w| oi) = (wig | a)}, 


Smin4i = {w | (wl ar, -** , aes) <a;k= 1,+-+, mb}. 


Also define 8; , 52, -+ + , Sminai by the expressions above for S,, S82, --- , Sminai, 
where < is replaced by < and > is replaced by >. 

We denote by \ a subset of the indices 1, 2,--- , m, m|n + 1. 

DeFINITION 7.3. The block group B, consists of the union of all S; with i in X 
and all T; not continaed in S; with i not in d. The closed block group B, consists of 


the union of all S; with i in and all T; contained in any §, with i in X. 

The above definition covers all cases where the functions are sufficient to 
reduce all cuts to points. However, if such is not the case we need the more 
general definition: 

DEFINITION 7.4. The closed block group B, consists of the union of all 8; with 
i in d. The block group By, consists of the complement of Bea) where C(A) is the 
complement of with respect to the indices 1, 2,---, m, m|n-+ 1. 

The definition of block groups is unique for a given set of points if all cuts 
are points; otherwise the chance procedure determines the block groups for a 
given set of points. 

DEFINITION 7.5. As in [5] we let c(A), @(A) stand for the coverages of the block 
groups By, By. 


8. General results. Discontinuous case. 

THEOREM 8.1. Let ®, , 2, +--+, Pm be any m-system of functions (Definition 
7.1), Wi,--- , Wn be a sample of n from an arbitrary distribution designated by 
W(m < n), and let the blocks, cuts, block groups, and coverages be formed according 
to Definitions 7.2, 7.3, and 7.4. Then tf 71, ¥2,°-* 5 Y¥p are any set of disjoint d’s, 


Priclyi) < 21, °°+ 5 (ve) > te, ++ » Cp) > Zp} 
> Pr{i(v) <a1,... :tlve) > we, ... , typ) > Zp}, 
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+1 > ° 
wheret, = Di Ext, bain = 7 t;, and t;, to, --- , tna: havea untform distri- 
° . ° 2 ° 
bution on the barycentric simplex.” In particular we have 


Pri{c(\) < x} > I.(s,n + 1 — 8) > Pr{@A) < 2}, 


where s is the number of indices in \ (m = n) and I,(s,n + 1 — 8) is the incomplete 
deta function. 


9. The functions y. In a manner similar to that of Section 6 in [5], we replace 
the m-system of functions by real functions y. 

LemMa 9.1. Given an m-system of functions , , --- , Bm, there exist real func- 
tions ¥(w), Po(w | vi), --- , Wm(w | yr, -** » Wm-1) such that if Wi,--- , Wa form 
a sample of n from W, then 

(i) Wi(wl yi, --- , Wis) is defined except for values of ¥1,--- , ¥i-a having P- 
measure zero; 

(ii) Pr{@(W,;) has a different relation (<, =, or >) to &(W;) than that of 
¥i(W;) to ¥i(Wi) (Bi, ---, Ba} = 0. 

The functions y; depend on the underlying distribution as seen from their 
definition below, but are only used as tools in proving the general theorem. 

To prove Lemma 9.1 we need the following lemma: 

Lemma 9.2. Let &(w) be a finite sequence of real functions ordered lexicographi- 
cally (Definition 7.1) and let W be a chance quantity. Define 


¥(w) = Pr{@(W) < O(w)}. 

(i) For each value of ¥(w) we are able to associate at most one value of &(w) with 
unassociated values of P-measure zero. 

(ii) Jf Wi, --- , W, ts a sample of the chance quantity W, then with probability 
one the relation (<, =, or >) between 6(W;) and &(W,) ts the same as that between 
¥(W;) and ¥(W;). 

Proor oF (i). Considering the function @(w), we ask when could two values 


of it, say &’, ”, correspond to a single value of ¥(w)? Since the values of 6(w) 
are ordered, this would mean 


Pr{®’ < d(w) < &”} = 0, 


and also any ® between ®’ and ®” would have the same value of ¥. These points 
would form an interval for ¢; , or, if not, for v2 , etc., and the P-measure for the 
interval is zero. To the corresponding value of y we associate the @(w) which is 
the upper limit point of the interval. Since there can be at most a countable 
number of disjoint intervals on the finite number of real lines and each with 
P-measure zero, then the P-measure is zero for the values of 6(w) which are not 
associated with values of ¥(w). 

The proof of (ii) is given on p. 36 of [5]. 

The proof of Lemma 9.1 follows easily by using Lemma 9.2. The first part 
of Lemma 9.2 shows that the definition of ¥,(w) for values of ¥,,--- , Wi is 


? See Tukey [4]. 
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unique except for values of ¥, --- , ¥i-; which have P-measure zero. This es- 


tablishes the first part of Lemma 9.1. The second part follows directly from 
Lemma 9.2. 


10. The representation theorem. In the proof of the general theorem we can 
no longer consider the joint distribution of {¥,(W)} as in [5]. The new proof 
does not need the general representation theorem of Section 8 in [5] but only 
the one-dimensional representation theorem in [3]. 


11. Proof of the general theorem. According to Lemma 9.1, the indices 
i(1), --- , i(m) used to determine the blocks are with probability one the same 
whether we use the ¥; or ®;. Also, if we consider the blocks themselves, for 
example 


S} {w|%(w) > #(wia)}, 
St {w | ¥i(w) > vs(wia)}, 


Lemma 9.1 shows that these differ by a set of P-measure zero and hence have 
identical coverages. Similarly for the other blocks. Hence it is sufficient to prove 
our theorem using the real functions ¥i,---, m. 

As in [5] we set up a continuous distribution which can produce by a mapping 
a distribution equivalent to that of the y; . It happens that for this continuous 
distribution the functions used to form blocks can be preassigned. 

Corresponding to ¥,(W) we define a function g,(U;) of a uniform variate such 
that the distributions are identical (See [3]). As in Lemma 9.2 there is at most 
one value of ¥,(W) for each value of U; (if we neglect appropriate points of 
P-measure zero) and at least one value of U; for each value of ¥,(W). Thus a 
function depending on the value of ¥,(W) can just as well be determined by the 
value of U;. 

For y¥.(W | ¥:) consider its conditional distribution for values of W restricted 
as follows: ¥:(w) < gi(u) or ¥i(w) = gi(uw) with the probability measure of 
¥i '(g:(u:)) reduced by the factor a, where 


. uy — inf 91 (gi(us)) 
sup gi (gi(us)) — inf gi (gilt) 


Define the function g2(U2| 1) of the uniformly distributed random variable 
U, such that its distribution is identical to the above described distribution of 
¥2(w | gi(m)). 

Similarly further functions g3(U; | uw , we), --- of uniformly distributed random 
variables U; , --- can be defined. 

From the above construction of the mapping of uw , ue, --- , Um it is obvious 
that the mapping of a sample of n from the uniform distribution on the product 
space [0, 1]” yields n values of variates to be associated with the y,, ¥2,--- , 
¥». The distribution of the largest value of U; is the distribution of ¥;(wiq), 
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and similarly for the others. The mapping has reproduced the part of the dis- 
tribution of the ¥;, --- , Ym in which we are interested. Also we note that the 
largest value of U, yields the largest value of g,(U,), etc. 

Apply our previous theorem to a sample of n from the uniform distribution 
on the product space [0, 1)" with functions u4, w,--- , Um and consider the 
following sets: 

Si = {((Ui,---, Um) | Ui > m(i(1))}, 
Ss ' (CU, ore Us) | U, < u,(i(1)), Us > us(7(2))}, 
Srinagar = ((U1, °°: , Um) | Ui < m(a(1)), «+, Un < Um(i(m))}. 
Also define: 


St = {gi(U1), «+>, gu(Um| Ur, ~*~ , Uma) | gi(Ur) > gi(u(i(1)))}, 


S? = {gi(Ui), 
gi(Uy) < gil (t(1))), ge(U2) > ge(ue(e(2)))}, 
Soins = {gi(U1), | gn(Ui) < gi(w(z(1))), “i }. 
StI, SI, --- are defined as S}, S?,--- except < is replaced by < and > 
by 2. : 
Consider now the inverse mapping of the sets ST , S? , --- and ST, SJ, --- 
into the space of (wu , uw, --- , Um). We shall have 


g (St) < S; cg (Sh, 
because 
gi(ui) > g(a) > us > a> gus) > gia). 


Thus we have the following inequality for the corresponding coverages: 
1* y/ o* 
cov S; < cov S; < cov S; . 


The theorem follows directly from these relations and the theorem for the 
continuous case. 

12. Selection of the cutting function #; by a random process. As indicated in 
the third paragraph of the introduction, the general Theorem 8.1 still remains 
valid if the functions are chosen by a random process from a class of such func- 
tions. The particular class from which #; is chosen may depend on the functions 
previously selected (; with 7 < 7) and on their values at the respective cuts. 
The point is that Theorem 8.1 is true for any sequence 4, --- , ®, and con- 
sequently is true when the sequence is chosen by a random process. 
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A METHOD OF INVESTIGATING THE EFFECT OF NONNORMALITY 
AND HETEROGENEITY OF VARIANCE ON TESTS OF THE GENERAL 
LINEAR HYPOTHESIS 


By F. N. Davin ann N. L. Jonson 


University College, London 


Summary. The method considered here for investigating the effect of non- 
normality and heterogeneity of variance on tests of the general linear hypothesis 
is based on finding the cumulants of a linear function of the two sums of squares 
used in the usual F-test. 


1. Introduction. Many of the standard forms of analysis of variance tests can 
be shown to be the likelihood ratio tests for particular cases of the general linear 
hypothesis [1]. It is assumed that there are n random variables x; and that 


(1) ag = ain4; + Aj282 i ee + Qis9, + zi (a = 1, 2. coe . Bie 


where the a’s are known constants, the matrix A = (a;;) is nonsingular, the 
6’s are unknown parameters and the z’s independent normal random variables 
each with expected value zero and variance o’, (the case where the variances 
are unequal but in known proportions is easily reduced to (1)). The general 
linear hypothesis of order p(<s) states that 0,541 = O@-pig = *-- = O = O. 
Kolodzieczyk [1] showed that the likelihood ratio criterion appropriate for test- 
ing this hypothesis could be expressed in the form (8,/p)/(Sa/(n — s)) where 
S, is the minimum value of 


a (v; — an — anb, — --- — ai, 9)" 


i=] 


with respect to 6, 62, --- , 4 and S, = S, + S, is the minimum value of 


} (a; — Gn — --- — en Oe 

i=1 
with respect to 0, , 02, --+ , &—» . Kolodzieczyk also showed that if the hypoth- 
esis tested is valid, i.e., if 


O.—-p+1 = O.—p+2 SM e- = 6, = 0, 


then the likelihood ratio criterion is distributed as F with p, n — s degrees of 
freedom. Accordingly if a test with level of significance a is required, then the 
hypothesis is rejected if 


ip. 
Sa/(n ‘ 8) 
where F'y,n-s,a is the upper 100a% point of the F distribution with p, n 
degrees of freedom. 


, 
> i PsN—8,a > 


> 


3382 
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2. Method of investigation. The power function of this test has been in- 
vestigated by Hsu [2] and Tang [6]. It is the purpose of the present paper to 
outline a method and provide detailed formulae for investigating the significance 
level and the power function of the test when the z’s are, in fact, not necessarily 
normally distributed and when their variances are not necessarily equal. A 
further possible inadequacy in the theoretical model may lie in the omission of 
certain parameters 6.4; , --+ , 4 in the derivation of the test. We shall there- 
fore assume that the correct theoretical model is 


Li = AiO, + AinOe + +++ + Aube +o °° + AeeruPeru + 2: 
(i = 1,2, ---,n), 


(2) 


where the rth cumulant of z; is x,;. We retain the assumption that the z’s are 
mutually independent and have zero expected value. We shall seek to approxi- 
mate to the value of 


(3) P a aw > Frasa) 


under our general conditions. This expression may be rewritten as 
P{S, — CS, > 0}, 
where 
C = 1+ pF y.n-2,e/(n — 8). 


This suggests that we may confine our attention to a study of S, — CS, , and 
we note that the moments of this function may be written down exactly. Our 
procedure will be to calculate:these moments and to approximate to the re- 
quired distribution of S, — CS, by choosing some form of distribution function 
which will have the same first four moments. In certain particular cases [3] 
where exact values are available this method gives usefully accurate results. 
We have found that the (8; , 82) points of S, — CS, correspond, in general, to 
curves of Pearson Type IV. Further calculations indicate that where the z’s are 
normally distributed Type IV curves give rather better results than the curves 
used in [3], which were Gram-Charlier Type A and curves of type Sy described 
by Johnson [7]. However, differences between probability integrals estimated by 
all three curves were not large, and if the approximate evaluation of a power 
function is desired it will not matter greatly which system of curves is used. 


3. Canonical form of the problem. The system of equations (2) relating 
21 ,%,°** , tn With the parameters 6, , 02, «+: , 6.4. and the random variables 
21, 22, °** , 2, may be written in matrix form 


(4) x = 0A’ +2, 


where 


zw = (4%; %2,°** , 2a); 6 = (0; , 02, °°* , Orru), z= (m, 2, 
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Quy *** Ais+e 


Qni*** Gn,se+w 


Suppose A is partitioned between the (s — p)th and (s — p + 1)th columns 
and between the sth and (s + 1)th columns so that 


(5) A = (Aw, A@,4A@) = (Aw, Ace»), 
that is, 

Aw = (Ag , A). 
Let 6 be similarly partitioned so that 


(6) 0 = (Bq) , Aa) , Hs) = (Og) , A2x)). 
Then it can be shown (see [4]) that 
Se = (03) A (3) ad z)M4(0A (a) + z)’ 


(7) ; ’ 

S, = (0¢23)A (23) + 2)M,(0e23)A 2x) + 2)’, 
where 
M, = I — A@(AwA@m) Aw 


(8) ’ oo 
M, = I — Ag»(AvnAen)” Acs. 


Hence 


zz mi(z; + Dy) (z; + Dj), 
(9) — 


= mi;(zi + Di)(z; + Di), 
where 
M. = (mj),  M, = (mi,), 
and 
Dg = Gins:Oeq1 Hoes + GispuOr+e , 
(10) Di = ie~p4i9e—p4n $+ °°* + int eOe+e 
= Ojo—p419%—p4i + °°* + Ge + D;. 


4. Moments of S, and S,. Using David and Kendall’s tables of symmetric 


functions [5], it is a simple matter to write down the cumulants of S, and S, . 
Thus if 


n n 
, ‘ / , / ‘ , 
6; = > mm; Dj, 7. m,;D; D; = Ap; 
i=l 


i,j=l 
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and we adopt the convention 


u(SrSa) = &[(S, — &(S,))*(Sa — &(Sz))', 


with x(S*S!) denoting the corresponding cumulant, we have, all summations 
running from 1 to n, 


«(S;) = 


x(S?) = 


8(S.) = Di icra + dd, 
» mii kas + 2 x D> mi; kaskaj + 4 a misd4Ksi + 4 a 5f'Ka:, 
y mixes + 12 x x mMis™4; Kackey + 6 > x Mig MGj M55 KBE Ks; 
+4 x x Mi} kaikay + 8 » X dX mij Mie M52 Kai kaj Kae + 6 2, mis Bess 
+ 24D 2 mis; 8; kaike; + 24 x x m3 Bexasnas + 12D misbe'ea 
+ 24D » mm s84 85 nainas + 8 2. di'es:, 


4 292 , m0 
= a M44 Kai a 24 » 2, M5 M4; KEG Kj te 24 > > M5 N45 MN; 5 KaG Kaj 
i é i ey 


+8 a x Mi; Kaskaj + 24 X x Mi, M44 M5 Ki kaj + 32 X 2» Ms Mi} Koikas 
+ 48 2X x 2» mi; m,, K4aikojko + 96 x 2. Do megs mss mig mje Kas kag Has 

+ 48 X X a Mi Mie M; 5 Mie Kaikajku + 96 X x Xu Mig M55 M52 Kai Kaj Kas 
+ 96 X x d mi; mit Mjekaikajke + 48 » » dX Dimi sm sem ipkaska Kevkae 
+8 2 mi28; Kr + 48 x 2X mii mi; 8; Ksike; + 96 » » mma 8; ksi Kas 
+ 96 X x mim; mM; 56 iKsiks; + 96 X De mssms38; K4ik3j 

+ 64 > 2X mi} 8; Kaika; + 192 X x » ms Mj Me Be Kae Koj Kee 

+ 192 2X 2d dX mi; Mie Bs Kai Koj Ke + 192 x 2d » mij Mit Mj 4 KaiKa5 Ket 


+ 24 ye mii 8; Kes + 192 yy >; ms: 5845 Kai Ke + 96 » > mij &: Kai Kaj 
. ‘ 3 . ? 

+ 96 > - msi 5585 Kai Ka; + 96 >. > m3 5; 8; K3ik3; 
‘ 7 + 7 


+192 >> >> > Mi; Mier 8; Be Koi Ko; Kae + 32 > miid;° ksi 
a 7 ‘ ‘ 


+ 192 D2 Do mis8,7 5; Kaine; + 16 Do 84 Kas. 
+ 7 + 


The cumulants of S, are identical in structure with those of S,, the only dif- 
ference being that all the primes are dropped. 





386 F. N. DAVID AND N. L. JOHNSON 


5. Cross-cumulants. Because of this identical structure of S. and S, it is 
an easy matter to write down the cross-cumulants. It is simple to show that 


K(S,So) = Do miysmis nas + 2 Dy Dy mij mi; Kr K2; 
‘ t 7 
+2 > 3H (mii, oo m5 5.) Ks: + 4 > 5:5, Key 
by elementary algebra. The result may also be reached by regarding the coeffi- 
cients of the x’s in «(S;) as undashed and splitting the numerical multipliers 


according to the number of ways in which one algebraic quantity may be dashed 
and the other left undashed. Again 


x(S; Sa) = » MiMi koi + +8 » 2» mss; Mi; Kika; + 4 dX 2X MM es ks; Ke 
2 a x Mis Mig M55 KgiKksj + 4 a X Mii Mi; M5; Koi Ks; 
+4 :® > om; m:; K3iksj + 8 > 7 Zz mi; Mie Mje Kai Ko, Kt 
1 j 1 i t 
+ 2 p mii di Ks: + 4 i MiMi d; Kei sib Mi 4; 85 Kg Ke; 
‘ s ' 3 
- 8 >. > mi; 3i Ks: K2; +8 pe Hs M5 Mj; jK3iK2; +8 > 7 Mi mi; 8; K3i K2j 
+ ? + 7 ‘ 3 


+ 16 >. Dd mij mij 8; «3iK2; + 8 D> Micdc5;Kas + 4D. Md, Ka; 


+ 16 >, D> mij8¢8; meine; + 8D. Dd, i858; maine; + 8D, 8: d:a:5, 


a result which may be reached either by elementary algebra or by making a 
dichotomy of the numerical multipliers according to the number of ways in 
which two coefficients may be dashed and the other left undashed in the ex- 
pression for «(S}). The expression for «(.S,S3) follows by symmetry. We have 
worked out the cross-cumulants of order 4 by two methods but since they are 
very long expressions and are easily reached from «(S}) by the combinatorial 
method we have briefly described we do not reproduce them here. 


6. Determinantal expressions. So far we have left the expressions for the 
cumulants in canonical form. It is clearly desirable to be able to calculate them 
quickly by means of determinants. Write 


Gu = ;® Qi; Qik 
‘ 
and 
Gu 
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v Y ’ 
Qi,e—p rt? G...9.0-8 | Aj ,s—p Ge—pa <F Gs—p,2—p | 


The similar quantities without primes will have the same determinantal form 
with the exception that the determinants will be of order (s + 1), instead of 
(s — p + 1). Then it is easy to show that 


mj = —a;;/A' when i ¥j 
mis = 1 — ags/A’ = Ais/ A’. 
Again it may be shown that 


0 a aa D; 13° Zz; Gi.» Di 


Gu ee Grey 


y Y 
Qj .s—p Go» i G.5.0~» 


a similar determinant of order (s + 1) being the expression for 4, . 
Finally 


VDF DYLaaDi --- Vain-pDi 


- aa D’ Gy = Gy, Pp 
a> = LD mi DiD; = 5 


, ’ ’ 
7 Qis—pD; (rs. p.l es Gry 
‘ 


p.t—p 


again the expressions without primes having the same determinantal form but 
being determinants of order (s + 1) instead of (s — p + 1). 


7. Inequality of variances on the normal case. In most investigations it will 
be the case that the algebraic form of S, the fundamental sum of squares, will 
be an adequate setup for the hypothesis tested. This being so we have 6; = 0, 
which will result in a considerable simplification of the cumulants of S, and of 
the cross-cumulants. We shall assume that this is the case for the rest of this 
work and shall not discuss it further here. We retain, however, the noncentral 





388 F. N. DAVID AND N. L. JOHNSON 


factors e . There now appear to be two main (different) simplifications which 
can be made. It may be assumed that the z’s are normally distributed each with 
a different variance, or it may be assumed that each z; has the same nonnormal 
distribution (whatever 7). We treat the normal case first, and the cumulants of 
Sections 4 and 5 reduce to 


&(S,) = Do misnes + Ad, 
; 
&(S.) X Mii kei, 
«(S;) > 7 mi; Koike; + 4 z 3," Kai, 
pg : 
x(S, Sa) = 7. z. Mi; M5; Koi K2; 5 
‘ d 
x(Si) = 7 >. M5 Kai K2; 
‘ 3 
«(S?) 8 » az dX mi mit Mie Koi Koj Kop + 24 2 hy mij 84 5; Kai 2; 5 
. I ‘ 3 
«(S:S,) = 8 > ec a mi; Mit Mjt Koi Koj Ko + 8 p » » mi; 5; 55 Ki K2;, 
“ee : 3 
«(S, 82) = §8 » - dX mj Mit Mjt Koi Koj Kae 
‘ 3 
x«(S3) 8 p re dX Mi; Mit M jt K2iK2; Kae y 
. 3 
(St) = 48 > > » DS mij Mig Mr Mer Koi Ka; KOE KE, 
, - 


+ 192 > 2. > mi; Mix 5584 Ko; Koj Kor, 
‘ 3 t 
x(S° S.) 48 a 2 ay z ms; mit Me Mey Ko; K2j Kae Kor 
‘ J t r 
+ 96 Zz zz Zz m;, Mit 55 54¢K|i Koj Kor , 
‘ 3 t 
«(S? S?) = 48 a » B dX > ms; Mit Mjr Mey K2i Koj Kae Ke 
1 ) r 
+ 32 > 2 7 Mi; Mit 5; 54 k2i K2; K28y 
‘ j t 
x(S, 83) = 48 Vy Zz b ; is mi; Mit M jr Mey K2¢ K2;j Kot Kor, 
1 } t r 


x (Si) = 48 >. >. 2 ze Nj Mit Mj, Mey Ko4 Koj Kot Kore 
+ } t r 


As a check we may notice that if we put 
Keg = Koz = Kor = Kae, 
— A cl ° ° 
whatever be 7, j, r and ¢, then the cross-cumulants «(S; S,) vanish as is expected. 
By approximating to the distribution of S, — CS, , for example, by assuming 


that it is a Pearson Type IV curve with moments derived from those above, we 
may, by putting the noncentrality factor 6; = 0, investigate the effect of hetero- 
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geneity of variances on the nominal significance level, a, in any analysis of vari- 
ance problem for which the general linear hypothesis is appropriate; or if 3; is 
given certain values we may find out the effect on the power function. A check 
on the adequacy of the Type IV approximation can be made at any stage by 
comparing the approximation against known values (i.e., when the variances 
are equal). Certain refinements in the design of experiments are possible by this 
approach. For example, if heterogeneity of variances is suspected as being a fac- 
tor which may enter into an experiment, it is possible to decide beforehand what 
will be the appropriate dichotomy of N, the number of trials, in order that this 
heterogeneity shall have as small an effect as possible on the nominal significance 
level a, under H, . Further, if by any chance estimates of this heterogeneity can 
be made from previous experiments, then an optimum dichotomy of N can be 
made. 


8. Nonnormality. If it is assumed that each of the z’s has the same nonnormal 
distribution then certain other simplifications become possible. It is felt that 
space will not permit a full list of the determinantal reductions, but by the aid 
of such relations as 


~ Dd mi; =n — 8, y= 1,2,3,--- 
3 


‘ 
Re Be ba Mij Mit mie =n — 8, 
} t 
‘ j 


and so on, the expressions for the moments become shortened. For illustration 
we give the results for orders one and two. 


&(S,) = (n— s— p)ke+ An,  &(Sa) = (n — 8)kz, 
x(St) = u(n — 2s + 2p + a a aj) + 2(n — s+ p) 
+ 4nzA’" Do Aid; + 42d 


x(Se) = u(n — 2s + “ y aie) + 2ne(n — 8), 


«(Sq Sp) = k4(p + ae yi i545 ~~ a” pm aii) + 2x3 ~ Meeks. 


It will be noted that the cross-cumulant «(S,S,) does not contain a term in x: , 
and this, as might be expected, will be true whatever the order of the cross- 
cumulant. By the aid of these moments the effect on the F-ratio test of neglecting 
to make a normalising transformation of the original data can be studied. For 
example, if we put all the cumulants equal to the Poisson parameter, A, this 
will enable us to see the result of neglecting the sinh™ transformation where 
this transformation is necessary. If we retain the A> term we find the effect of 
nonnormality on the power function and hence can investigate the variation of 
sample sizes from those which in the normal case supposedly give a required 
power. 
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9. Special cases. Although the determinants of Section 6 enable the moments 
of S, — CS, to be calculated quickly, certain difficulties are encountered in trans- 
lating the various quantities involved in the determinants into terms of the 
different “setups” of the analysis of variance. We give therefore as illustrations 
the evaluation of the necessary determinants for two classical types. 

9.1. Single classification. The model is 


=At Ci + 2: @ coe a th mm Ess . B. 


It is assumed that there are s groups with n; observations in the ith group, A, 
C,; are parameters, 2’s independent random variables, and dnt n, = N, 
yee, = 0. Let the N observations be 1, 2, --- , 7, - ds , N. In the 
tth group 


A;:/A = (ny — 1)/n, A;;/A’ = 
ay/A = l/n:, ass / A! = 
If both 7 and 7 are in the ¢th group, 
aj;/A = 1/n,, Ai; A’ 
If 7 and j are not in the same group, 
aj;/A=1/N, — a;;/A’ = 1/N. 


C, is the difference between the expectation in the ¢th group and the expectation 
over all groups. Hence 


Ap = >» a Oe 


t=1 


‘ Ps . i om . 
It is seen that Ap is thus a measure of noncentrality. The sums of squares appro- 
priate to testing the hypothesis C, = 0 (¢ = 1, --- , s) are 


«FF q 


tl i=l 


where 


and we have then 


&(S,) => 2 7 (1 —- ) Kat + 2 Ni o &(S,) = > (m% — 


t=] t=1 t=1 


«(S?) = > n (1 = x) K4t +2 D[n(1 — ) dp ma Ue 
+ 2 DOA wut + 4 Dm (1 ~ ¥) Conse + 4 DS me Cease, 


tyky 
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and so on. These results were given in full in [3]. 
9.2. Analysis of regressions. The model is now 


Ue = a + Bla, — EB) + Be t+ 2ui., 


») 


it being assumed that there are n; observations at x, and / = | 
8 and B; are unknown parameters and 


8 * 
= > nm: Bix, z= ND mex. 
t=] 


t=} 


> n, = N, > 2 B, => 0 
t=} t=] 


The determinants in the th group are 
Ay/A = (nm: — 1)/m, Ai,/A’ =1—- (N+ ia. n, X%), 


where 
Xe = i ; 
a;;/A = 1/n,, a;:/A’ = N+ x2. “x. 


If both 7 and j are in the (th group 
a,;/A’ = N+ X°/>> u Xi. 
t 


ai;/A = 1/m, 


If i and j are not in the same group, but in the /th and ¢’th groups respectively 


ais/A’ = N* + X,Xv/D0 un Xt. 
t 


a;,/A = 0, 


If we are testing for departure from linearity, the hypothesis to be tested is 
s). In this case 6; = B,, An = >>. n-Bi . The fundamental 


B,=0(¢=1,---, 
sums of squares are 


S=LVw-ae, S = DD wi gy. — ba — BD)’, 
t ‘ t ‘ 
where 
>» ni (a, _ 2) (H. — g..) 
ep) ns y..= Se ae 
Nt ‘ N t 


t 
’ 


b= << 
-, n(x — x)” 


We have therefore, for example, 


&(S,) > : Mika: + Ab = a CY ae + a Ne Bi 
. t t 
- u(1 —N* — x?/> n, X°) xo + a n, Bi. 
t t t 
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‘2 ‘ , /2 
M5 K2iK2y + 4 > MiiOiKss + 4 7 by Kay 
‘ t 
ation N* 3 dn x0) 
= 2m : «A 1 t<\2/) Kae 
t t 


+3) ti — W* — /> ado + 


, . y2 y8\2) 3 
M(t — | YON , t X v2. The XxX ) | Ke 


+ 2 7” mn iN + XiXu/ Dd, me Xt)” wee we, 


tyér 


+ 4 > n(1 — N* — X7/>, a XD Bene 


lhe other cumulants follow similarly by substitution. 


10. Conclusion. It is believed that the method described above provides a 
useful means of investigating the effect of various forms of departure from the 
theoretical models used in the analysis of variance. While we have only discussed 
the systematic (or parametric) form of model in the paper, a similar approach 
has been found useful in the case of the random (or components of variance) 
model and also in investigations of the distributions arising in randomization 
theory. There is, of course, some uncertainty about the accuracy of the prob- 
abilities obtained from the curves fitted to the moments of S, — CS,. Nu- 
merical work so far carried out indicates that, at any rate for the parametric 
model, this method provides an adequate mode of approximation. 

In the most general case calculation of fifth and higher order moments and 
product moments appears to be prohibitively lengthy in practice, but it would 
be comparatively easy to calculate moments of higher order in some of the 
simpler cases (e.g., where there is normal variation). It is possible that closer 
approximations to the required probabilities could thereby be obtained. 
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ON THE TRANSLATION PARAMETER PROBLEM FOR 
DISCRETE VARIABLES' 


By Davip BLACKWELL 
Stanford University 


Summary. For any chance variable z = (m1, --- ,2y) having known distri- 
bution, the translation parameter estimation problem is to estimate an un- 
known constant h, having observed y = (2, + h, --- ,2w + h). Extending 
the work of Pitman [2], Girshick and Savage [1] have, for any loss function de- 
pending only on the error of estimate, described an estimate whose risk is a 
constant R independent of h, and have shown that under certain hypotheses 
their estimate is minimax. We investigate whether the Girshick-Savage estimate 
is admissible, i.e., whether it is impossible to find an estimate with risk R(h) < 
R for all h and actual inequality for some h. We consider only bounded discrete 
variables z, and show that, if all values of x have all integer coordinates and if 
the loss f(d) from an error d is, for instance, strictly convex and assumes its 
minimum value, the Girshick-Savage estimate is admissible. Two examples in 
which the Girshick-Savage estimate is not admissible are given. 


1. Preliminaries. Let r,,---,7 be distinct points in the hyperplane 
>in xz; = 0 in Euclidean N-space Ry , let 8;;,7 = 1,--- ,k3 7 = 1,--+ ,m, 
be real numbers with s;; # s;;, whenever 7 ¥ 7’, and define v;; = r; + €8:; , 
where ¢ = (1,1, --- , 1). Let a; > 0, p;; > 0 be numbers such that > a; = 1, 
>e1 Pi; = 1 for each 7, and let x be a chance variable such that P{z = »;;} = 
api; . Clearly any N-dimensional chance variable z assuming only a finite 
number of values can be represented in this way. The translation parameter 
estimation problem is to estimate the value of an unknown constant h, having 
observed y = x + eh. An estimate for h is then a real valued function t(y), de- 
fined for all vectors y = r; + 68,4 = 1,--- ,k, —© <8 < o™, specifying the 
estimated value of h as a function of the observation y. We shall suppose that 
the loss to the statistician depends only on the error d = t(x + eh) — h, and is 
given by a nonnegative function f(d) defined for all real d. For a given A, the 
risk, i.e., the expected loss, from an estimate ¢ is 


R(h) = Lo as pisflt(vis + eh) — hl. 


For any estimate ¢, the quantity y — «(y) = u(y) can be considered as an 
estimate of the value of z; in terms of u, the absolute value of the error is | d | = 
N~*| u(y) — «|, where | v| denotes the length of the vector v. In terms of u, 
the Girshick-Savage estimate becomes an extremely natural one. If we repre- 
sent z as r + es, where the sum of the components of ris0, —» <s < o, the 
observation of y determines r, and gives certain information about s which it is 
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hard to utilize unless one has a priori ideas about h. The Girshick-Savage esti- 
mate simply ignores whatever information y contains about s, and makes u a 
function of r only. If we are given r = r;, the conditional distribution of z is 
fe =r, t+ es;;} = pi; , and, for u(r;) = r; + ew, the conditional risk is Q,(w) = 
> pif (8.3 — w). If inf, Q(w) = R;, and W; is the set of real numbers w with 
Q,(w) = R;, the Girshick-Savage estimates are the estimates u(r) such that 
u(r;) = ry + ew;, with w; ¢ W;. The risk from any Girshick-Savage estimate 
is R = Dak; for all h. 
Any estimate u(y) is specified by k real functions 2,(s) , --- , 2e(s): when 
y = 7; + es, u(y) = ri + e2;(s); and conversely every set of k functions deter- 
mines an estimate. The corresponding estimate of h is t(y) = s — z,(s). The risk 
1s 


k m 
R(h) = 2. ay 2 DisS (si; - 2:(8;; + h)). 
1 j=l 
Thus formulated, the N-dimensional estimation problem is simply a collection 
of k one-dimensional problems, with the particular one-dimensional problem to 
be faced by the statistician selected according to the probabilities a; , --- , a . 
This fact enables us to restrict attention largely to one-dimensional problems. 


2. The main result. We have seen that the risk from a Girshick-Savage esti- 
mate is a constant 2 independent of h. A question raised by Girshick and Savage 
is whether their estimate is admissible, i.e., whether it is impossible to find an- 


other estimate with R(h) < R with actual inequality for some h. The theorem 
of this section gives some conditions under which the Girshick-Savage estimate 
is admissible; essentially the result is that, for bounded variables x for which 
all s;; are integers, and strictly convex loss functions f(d) with f(d) — © as 
d — +, the estimate is admissible. Some cases in which the estimate is not 
admissible are described in the next section. The main result is a consequence of 
the following lemma, an analogue of which has been obtained by Lehmann [oral 
communication] for normally distributed 2’s. 

Lemma. /f all s;; are integers and if f(d) is continuous and such that (a) each 
Qi(w) = > Diif (Si; — Ww) assumes its minimum R; at a unique point w; , t.e., the 
Girshick-Savage estimate exists and is unique, and (b) Qi(d,) —~ R; asn —> « 
implies d,, v; , then for any estimate 2(s), , 2e(s) we have 


B k 


lim >> [R(k) — RI = Dia > (Q.lz(s)] — Ri, 


A~—e t=1 
B-»—co 


where h, s assume only integer values. 

Proor. Since the hypothesis for the N-dimensional problem implies the hy- 
pothesis for each of its one-dimensional components, and the conclusion for 
each component implies the conclusion for the entire problem, it is sufficient to 
prove the theorem in the one-dimensional case. Suppose, then, that x is one- 
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dimensional, integer-valued, and bounded, say P{z = j} = p;, 2." pj = 1, 
min Q(w) = Q(w:) = R, where Q(w) = » ea pil(j — w), and let z, be any esti- 
mate defined for all integers h. We have 


ROW) = > pifli — zs. 


For any integers A, B with A < B, we have 
B+m min(m,i~-A 


LR) = >¥X msl — il. 


=A~—m j=max(—m,i—-B 


Kor B — A > 2m, 
3B. 

(1) Be Rth) 
A 


For any set of 2m numbers (w_,, , «°° , Um—1) = u, define 
o 1 


m1 i 
gd ia) = 7 > pitlj —_ uil, 


i——m jm 


Atm B+m 


i tA m bm m 
ee Mee Yee 
i=Atm )=—m 


1—A—m j=—m 1=B—mt+1 ji—B 


m—\ m 


G(u) = - 2, piflj — uil, 


i=—m j= 


so that g(u) + G(u) = >2"=",,Q(u,). Then 


B—m 


B 
(2) >, Rh) = g(Pa—m) + G(Ps_mus) + Dd Qe), 


Atm 


where, for any integer a, Pa = (Za, -*- , Za42m-1). (2) may also be written as 


B+m 


B 

(3) - R(h) = g(P 4—m) rs g(P g—m+1) + z Q(z.) 
A A+m 

or 


B+m 


B 
(4) > Rh) = —G(Py—») — g(Ps—mnr) + DY Qed. 
A Am 


Since g(P) and G(P) are nonnegative for all P, 


b—m B+m 


> A) < ERA) < DY Qed, 


so that 


B—m B B+m 


> (Q(z) — R) —2mk < DY (RA) — R) < D (Qe) — R) + 2mR. 
A+m A A-—m 
Now Q(z,) > R for all «. If >5%.[Q(z,) — R] diverges, then it follows that 
B 


lim >> [R(h) — R| = ~. If * [Q(z;) — R] converges, then, asi — +, Q(z,) 


Ax A 
Bora 
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+ R,z:— w,, and P, > (w,, wi, -:*, wi) = P*. Since g is continuous, as 
A— —«, B— o, we have g(P4_-m) — 9(P*), g(Ps—mii) — g(P*), so that 
(3) yields 


lim >> (R(h) — R) = > Q(z) — RI. 


A-—-—@ 4 
B--x 


THEoreM 1. Under the same hypotheses as those of the Lemma, if 2:(s), «++ , 2x(s) 
is any estimate with R(h) < R for all integers h, then R(h) = R for all h, and z,(s) = 
w; for all integers s and all i = 1,--- , k: the estimate is the Girshick-Savage 
estimate for integers. 

Proor. According to the Lemma, we have 


20 k 20 
+3 [R(A) R) = ae Qj Sa [Q;(2:(s)) — Ril, 


t=1 smc 


so that both sides are zero, and R(h) = R for all h, z,(s) = w; for all s. 

When the s;; are restricted to be integral, h may as well also be so restricted, 
since the statistician can, by considering y* = 7; + els] when y = ri; + 68, 
reduce the problem to one where h is replaced by [h]. For completeness, however, 
we prove 

THEoREM 2. Under the same hypotheses as those of the Lemma, the Girshick 
Savage estimate ts admissible. 

Proor. Let 2:(s), --- , 2(s) be any estimate for which R(h) < R forall A. 
(s, h now assume all real values.) For any ho, consider the estimate z7(s) = 
zi(ho + 8). Then R*(h) = R(ho + h ) < R for all A. In particular, R*(h) < f 
for all integral h, so that, by Theorem 1, R*(h) = R, 2*(s) = w; for all integers 
h, s, and allz. Choosing h = Oands = 0 yields R(ho) = R, z:(ho) = w; for all 
1:2;(s), --- , 2x(s) is the Girshick-Savage estimate. 

Remark. The above results are closely related to 

THEOREM 3. Let S be any closed bounded strictly convex subset of N-space which 
is tangent to the line x; = --: = xy at the point (w,--:, w) = P*. The only 
sequence of numbers {zn}, —~ <n < ©, for which each point P,, = (Zn41, 
Zn+n) € S 18 Zn = W. 

Thus, if f(d) is strictly convex and f(d) ~ » asd ~+,and p; > 0, |i| < m, 
the set >._™ pf(u;—2t) < min Dp: f(w — 1) is a closed bounded strictly convex 


subset of Romsi, tangent to the line U_m = ‘+: = Um at the point (wy, wo, -- 
wo), Where min 2 pif(w — 2) occurs at wo. The theorem then asserts that he 


w 


only estimate {z,} with R(h) < R for all h is the Girshick-Savage estimate. 
The proof of this theorem follows the pattern of the proof of the lemma but is 
simpler in detail, as follows. 

Let _* >? az; = Obe a tangent plane to S at P* which contains the line 
a= --: =2y;say L(x) < Oforze S. ForB-—A>N, 


B+N min(N,i—A) N—1 N-1 


> L(P,) = zs 25 ar a; ye biZa4i — a bi2peiti 


A+l1 max(1,i—B) 


= M(P,) — M(Psy), 
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where b; = 0} a;and M(x) = Za bz; , using the fact that 2 aa; = 0 con- 
tains the point (1, --- , 1), so that a a; = 0. If all points P, ¢ S, then L(P,) 
> 0 for all h. Since M is bounded on S, aie L(P,) converges and, ash— +, 
L(P;) — 0 and P, ane P*. Then 


> L(P,) = lim {M(P,) — M(Ps)} =0, L(P,) =0 


A--—o 
Bow 


for all h, so that P, = P* for all h. 


3. Examples. We present here two one-dimensional examples in which the 
Girshick-Savage estimate fails to be admissible. 

EXxamMPLe 1. = +1, each with probability 3, fd) = {|d| for |d| > 1, 
f(d) = 1for|d| > 1. We have Q(w) = 3f(w — 1) + 3f(w t 1), min Q(w) = 
4 = Q(— 1) = Q(1). Thus there are two Girshick-Savage estimates: z = —1 
and z = 1, each yielding the constant risk R = 3. (The corresponding estimates 
for h when y = n is observed are n + 1 and n — 1.) The estimate z, = —1 for 
n <0,2,. = lforn > 0 (i.e., for h, estimate n + lifn <0, and n — 1ifn > 0), 
which is not a Girshick-Savage estimate, has R(h) = 4 forh # —1,0, R(—1) = 
R(O) = 0. One can even be frivolous at a single point, setting z, = —lforn < 
0, zn = 1 for n > 0, 2 = 7, and still obtain R(h) = 4 for h ¥ 0, R(O) = 0, 
an improvement over the constant risk 3. The extension of either estimate to all 
h can be made, for instance by defining z(y) = z;,,; , where [y] denotes the largest 
integer not exceeding y. 

ExaMp_e 2. In Example 1, the pathology occurred in the loss function. 
We now set f(d) = d’, so that the expected loss is simply the mean square error, 
and exhibit an x for which the Girshick-Savage estimate is not admissible. 
Since, for any x, f(d) = d’ will satisfy the hypotheses of Theorem 1, we must go 
beyond bounded, integer-valued variables. 

Let e = —l,e; > Oforz = 1,--- ,k be k + 1 rationally incommensurable 
numbers (i.e., the only integers n; such that > 3 ne; = O aren, = 0,1 = 0,-- iy 
k), and let P{x = —e,} = p;, with the p; chosen so that >.5 p; = 1, pe; = con- 
stant for7 = 1, --- , k, and 23 pei = 0. For given e; , these requirements de- 
termine po , --- , px uniquely, and p; > Ofor7 = 0, --- , k. Let S be the additive 
group determined by @, --- , @, i.e., the set of all numbers representable as 
= ne; , where nm , --- , m are integers. Thenfor h € S, all values of x + heS. 
We shall define an estimate 2(s) for se S for which R(h) < R for all he S. 
The extension of z to all real numbers will be straightforward. For any 2(s), 
the inequality R(h) < R becomes 

k 


k 
2 pie + zh —edl < d pd, 


1=0 


kh k 
; Z. piz(h—e) < -> piesz(h — @&), 
0 


-~ i= 


=0 
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L]| 2 le chee 
2(h — @) + = > Senet | < z(h — @) — z(h — e;). 
2 k 1 , k 1 

Ifs = Dot nie; , we define z(s) = Ounless din; = 0. If >6 n: = 0, we represent 
s by the vector v = (m,,--- , m%). Let ¥1, yo, -*- be independent vector chance 

. : l ; . 

variables, with P(y = 6;) = 7 1,--: ,k3;¢= 1,2, --- (where 6; is the vector 
with k components of which the jth is 1 and the others are all 0’s). Let z(v) be 
the probability that y; + --- + yw = v, where N = 23 n;, (0) = 1. Then 
zo(v) = O if any n; < 0, and a(v) = N!/(m!--- m!k”) if all n; > 0. We shall 


define z(v) = ayz(v), and choose nonnegative numbers ay so as to satisfy 
R(h) < R. This inequality becomes, for v ¥ 0, 


- is 2»: ; &¢d6~« | 
(5) . Exec . eee H < (ays: — ay)zo(v), 


2 k 1 e; ] 


k \ y ° ° 
where h — @) = Zz Nei, v = (m,°°°, Ne), zs n; = N + 1, using the fact 
l<—s oa ee 
that 2o(v) = j. > ‘n1z0(v — 6;) for v ¥ 0. For v = 0, the requirement is a> < 
dp , 1.€., ao < 2. Since zo(v) = O when — n; < 0,v ¥ 0, (5) is satisfied for N < 0, 
v ~ 0. For N > O, let wy = max &(v). This maximum occurs for the 


wk 
~inj=Nn 


choice of mm, ---, m , unique except for order, for which |n;, — n;j| < 1 
WN 
yy itt k) 


a positive constant. Then there is ac > 0 with wy <¢ N*“™ forN = 1,2, 


for all 7, 7, and Stirling’s formula yields — c, as N — o, where ¢; is 


“ : 1 , ‘ 

Since 2o(v) = i 23 zo(v — 6;) and z > 0 for allv, we have a(x — 6;) < kao(v). 

Thus for nondecreasing ay, the left member of (5) is less than day 4,25 (v), 

where d is a positive constant (for fixed k and e;). Thus (5) is satisfied if 
9 9 ’ ’ e o- 2 ° 

dayiizo (v) S (Qna1 — ay)ao(v), ie., if daysieo(v) < Gwar — ay . Since 

z(v) < ¢ (N + 1) '“ it is sufficient to choose ay such that 


(6) an+1 < b(ani1 — ay)(N + 1), 


where b is a positive constant. If ay = N*,0 <« < 4, and k > 3, (6) will be 
satisfied for sufficiently large N, say N > No. Setting ay = 0 for N < No and 
ay = N‘for N > No satisfies (6) for all N, with actual inequality for sufficiently 
large N. 

Thus, for k > 3, we have defined an estimate z(s) for s¢ S with R(h) < R 
for allh e Sand R(h) < R for at least one h ¢ S, say ho , where 


k 
R(h) = >> piles + z(h — ef 


=) 
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For any h, € S, let z;(s) = z(s + h;). Then R,(h) = R(h + hy), sothat Ri(h) < R 
for all h and Ri(ho + hy) < R. Thus for each h* ¢ S there is an estimate z+ (s) 
with Ris(h) < R for allh, Rae(h*) < R. Let ay be a set of positive numbers with 
Doaeet, = 1, and define z*(s) = > res@nzn(8); since the original z is bounded, 
the series converges. Since R(h) is a convex function of z, R*(s) < DorestnRn(s) 
for s e S, and 2*(s) is the required estimate. 

To extend z* to all real numbers, divide all real numbers into classes s,, 
with y; in the same class as y2 if and only if y: — ye e S, and choose a repre- 
sentative ¢. of each S,. Then every y has a unique representation y = ta + 8, 
s € S; define z(y) = 2*(s). Foranyh = ta + 8, 


Rh) = ¥ pdes + 2*(s — dF = R*®) < RB. 
=O 


Notice that the extension z(y) of z* is a nonmeasurable (Lebesgue) function of 
y. It can be shown that any 2(y) with R(h) < R for all A is necessarily non- 
measurable; a variation of the method of proof of Theorem 3, evaluating 


B 
/ L(P,) dh instead of >“%L(P,) over integral h, shows that for any Lebesgue 


measurable z(y) with R(h) < R for all h, the set R(h) < R has Lebesgue measure 
zero and that, for almost all y, the estimate z(y) = 0, agreeing with the Girshick- 
Savage estimate. 
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A STOCHASTIC APPROXIMATION METHOD’ 


By Herspert RospBINsS AND Sutton Monro 


University of North Carolina 


1. Summary. Let M(x) denote the expected value at level z of the response 
to a certain experiment. M(x) is assumed to be a monotone function of x but is 
unknown to the experimenter, and it is desired to find the solution z = @ of the 
equation M(x) = a, where a is a given constant. We give a method for making 
successive experiments at levels x; , t2 , -- - in such a way that z, will tend to 6 in 
probability. 


2. Introduction. Let M(z) be a given function and a a given constant such 
that the equation 


(1) M(z) =a 


has a unique root x = 6. There are many methods for determining the value of 6 
by successive approximation. With any such method we begin by choosing one or 
more values x; , --- , Z, more or less arbitrarily, and then successively obtain new 
values z, as certain functions of the previously obtained 2 , --- , 2,1 , the values 
M(x), --- , M(2n_:), and possibly those of the derivatives M’(z,), --- , M’(x._1), 
ete. If 


lim 2, = 8, 
irrespective of the arbitrary initial values z,, --- , z,, then the method is 
effective for the particular function M(x) and value a. The speed of the con- 
vergence in (2) and the ease with which the z, can be computed determine the 
practical utility of the method. 

We consider a stochastic generalization of the above problem in which the 
nature of the function M(x) is unknown to the experimenter. Instead, we suppose 
that to each value x corresponds a random variable Y = Y(zx) with distribution 
function Pr[Y(x) < y] = H(y | x), such that 


(3) M(z) = [ y dH(y | x) 


is the expected value of Y for the given xz. Neither the exact nature of H(y | z) 
nor that of (zx) is known to the experimenter, but it is assumed that equation (1) 
has a unique root @, and it is desired to estimate @ by making successive observa- 
tions on Y at levels x; , x: , -- - determined sequentially in accordance with some 
definite experimental procedure. lf (2) holds in probability irrespective of any 
arbitrary initial values x , --- , z, , we shall, in conformity with usual statistical 
terminology, call the procedure consistent for the given H(y | x) and value a. 


' This work was supported in part by the Office of Naval Research. 
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In what follows we shall give a particular procedure for estimating @ which is 
consistent under certain restrictions on the nature of H(y | x). These restrictions 
are severe, and could no doubt be lightened considerably, but they are often 
satisfied in practice, as will be seen in Section 4. No claim is made that the 
procedure to be described has any optimum properties (i.e. that it is “efficient’’) 
but the results indicate at least that the subject of stochastic approximation is 
likely to be useful and is worthy of further study. 


3. Convergence theorems. We suppose henceforth that H(y | x) is, for every x, 
a distribution function in y, and that there exists a positive constant C such that 


(4) Pr{| ¥(x)| < Cl = [ iiss) = 1 Sor all's. 


It. follows in particular that for every xz the expected value M(x) defined by (3) 
exists and is finite. We suppose, moreover, that there exist finite constants a, 
6 such that 


(5) M(x) <a for zx < @, M(x) >a for zx> #6. 
Whether /11(6) = a is, for the moment, immaterial. 
Let {a,} be a fixed sequence of positive constants such that 


(6) 0< da =A < @~. 
1 


We define a (nonstationary) Markov chain {z,} by taking z, to be an arbitrary 
constant and defining 


(7) In41 — In = Gn(@ — Yn), 
where y, is a random variable such that 

(8) Priyn < y| tn) = Hy | an). 
Let 

(9) b, = E(z, — 6)’. 

We shall find conditions under which 


(10) lim b, = 0 


n—-o 


no matter what the initial value 2, . As is well known, (10) implies the convergence 
in probability of z, to @. 
From (7) we have 


bast _ E(tn41 a @)’ = ELE (ras: = 6)" | zal] 


(11) | [ {(z, — 0) — aly — od} at y | 2) | 


=b,+ a, | [ (y — a)’ dH (y} | — 2a, El(z, — 9)(M(z,) — a)). 
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Setting 


(12) d, = El (an — 6)(M (xn) ~~ a), 


(13) Lf @- «any “|, 


we can write 
(14) _ = Anen — 2, dn. 
Note that from (5) 
d, = 0, 
while from (4) 
O<e<(C+lall<o. 


Together with (6) this implies that the positive-term series = a.e, converges. 
Summing (14) we obtain 


(15) bay = Oy + > a; e- 2 - a; d;. 
j=1 j=l 


Since 6,4; > 0 it follows that 


(16) >. a; d; s 3 E + Dake, < w, 
I 


j=l 


Hence the positive-term series 

(17) Do an dn 
1 

converges. It follows from (15) that 


(18) lim b, = b: + Do anen — 2 Do anda = b 
no 1 1 
exists; b > 0. 
Now suppose that there exists a sequence {k,} of nonnegative constants 
such that 


a 


(19) dy = knbn, >, Gnkn = 


1 


From the first part of (19) and the convergence of (17) it follows that 
(20) >, dnknba < ®. 
1 
From (20) and the second part of (19) it follows that for any e > O there must 


exist infinitely many values n such that b, < ¢. Since we already know that 
b = lim y+ b, exists, it follows that b = 0. Thus we have’ proved 
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Lemma 1. Jf a sequence {k,} of nonnegative constants exists satisfying (19) 
then b = 0. 


Let 
(21) A, = |% — 0) + (C+ |e |\(q, + a2 + +--+ + an); 
then from (4) and (7) it follows that 
(22) Pr{| z, — @| < A,] = 1. 


Now set 


(23) k, = inf [Mess] for 0<\|xr — 6| < Ang. 


From (5) it follows that k, > 0. Moreover, denoting by P,(x) the probability 
distribution of z, , we have 


di, = [ (x — 0)(M(x) — a) dP, (x) 
J 12-0) < Ay 
(24) 


= / k. a 6 F dP,(x) - Kn bn. 
|zO1S An 


It follows that the particular sequence {k,} defined by (23) satisfies the first 
part of (19). 


In order to establish the second part of (19) we shall make the following 
assumptions: 


(25) 


for some constant K > 0 and sufficiently large n, and 


2% ap 
26 = w, 
(2 ) 2 (ay + 7 + An—1) 


It follows from (26) that 


x 
(27) >a. = &, 
1 
and hence for sufficiently large n 
(28) 2[/C + | a |}(a, 4. eee +. a» 1) > a. 
This implies by (25) that for sufficiently large n 


ee K a, K 


(29) ay ia = An - 


A, ~ AC + fall@+--- faa)’ 


and the second part of (19) follows from (29) and (26). This proves 
Lemma 2. If (25) and (26) hold then b = 0. 
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The hypotheses (6) and (26) concerning {a,} are satisfied by the sequence 
a, = 1/n, since 
aoe Wey > ae 
rn n=? n(i . 1 ie — 
2 n— 1 


More generally, any sequence {a,} such that there exist two positive constants 
c’, c” for which 
(30) ~Sa<* 

n n 
will satisfy (6) and (26). We shall call any sequence {a,} which satisfies (6) 
and (26), whether or not it is of the form (30), a sequence of type 1/n. 

If {a,} is a sequence of type 1/n it is easy to find functions M(x) which satisfy 
(5) and (25). Suppose, for example, that M(z) satisfies the following strength- 
ened form of (5): for some 6 > 0, 
(5’) M(z)<a-—56 for z<4@, M(z) >a+6 for z> @. 
Then forO0 < | z — 6| < A, we have 
M (x) a 

z—-@ 


(31) 
so that 
(32) 


which is (25) with K = 6. From Lemma 2 we conclude 
THEOREM |. [f {a,} ts of type 1/n, if (4) holds, and if M(x) satisfies (5’) then 
6 = 0. 


A more interesting case occurs when M(z) satisfies the following conditions: 
(33) M(x) is nondecreasing, 
(34) M(@) = a, 


(35) M’'(@) > 0. 


We shall prove that (25) holds in this case also. From (34) it follows that 


(36) M(x) — a = (« — 6)[M’'(0@) + ex — 8)}, 


where e(¢) is a function such that 


(37) lim €(t) = 0. 


t-+0 
Hence there exists a constant 6 > 0 such that 


(38) e(t) > —3M’(6) for 
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so that 


inh M(x) — ¢ 1 
(39) M() — « > ; M’'(6) > 0 for |x —-0| <6. 


xr — 6 2 
Hence, for 6+ 6 < x < 0+ A,, since M(z) is nondecreasing, 


(40) M(z) — a. M@ +8) — a, 6M’) 
t—@ An — "a4. 


while for 6 — A, < x < 6 — 6, 


(41) M(x) = 0 eS — M(x) > ee M(6 — 6) > 6M (9) 


z— 0 @-2z A, it 

Thus, since we may assume without loss of generality that 6/A, < 1, 
M(z) — a , 6M’() 

6-8. Bae 
so that (25) holds with K = 6M’(6)/2 > 0. This proves 

THEOREM 2. /f {a,} is of type 1/n, if (4) holds, and if M(x) satisfies (33), (34), 
and (35), then b = 0. 

It is fairly obvious that condition (4) could be considerably weakened without 


affecting the validity of Theorems 1 and 2. A reasonable substitute for (4) 
would be the condition 


(42) for 0<j|z-—06|< An, 


(4’) | Miz) i < C, [ (y — M(2))*’dH(y|2) < oo’? < for all zx. 


We do not know whether Theorems 1 and 2 hold with (4) replaced by (4’). 
Likewise, the hypotheses (33), (34), and (35) of Theorem 2 could be weakened 
somewhat, perhaps being replaced by 


(5”) M(x) <a for z < @, M(x) >a for z> @. 


4. Estimation of a quantile using response, nonresponse data. Let F(x) be 
an unknown distribution function such that 


(43) F(@) =a(0<a< 1), F’(@) > 0, 


and let {z,} be a sequence of independent random variables each with the 
distribution function Pr[z, < x] = F(x). On the basis of {z,} we wish to estimate 
6. However, as sometimes happens in practice (bioassay, sensitivity data), we 
are not allowed to know the values of z, themselves. Instead, we are free to 
prescribe for each n a value z, and are then given only the values {y,} where 

(1 Sy Se (“‘response’’), 
(44) Yn = 

lo otherwise (“‘nonresponse’’) | 
How shall we choose the values {z,} and how shall we use the sequence {y,} 
to estimate 6? 
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Let us proceed as follows. Choose x; as our best guess of the value @ and 
let {a,} be any sequence of constants of type 1/n. Then choose values 22 , 23 , - - 
sequentially according to the rule 


(45) In41 — In = O,(@ — Yn). 

Since 

(46) Priy, = 1| 2] = F(a), Priy. = 0|2,] = 1 — F(a,), 
it follows that (4) holds and that 

(47) M(x) = F(z). 

All the hypotheses of Theorem 4 are satisfied, so that 


(48) lim 2, = @ 

: n-+c0 

n quadratic mean and hence in probability. In other words, {z,} is a consistent 
estimator of 6. 

The efficiency of {x,} will depend on 2, and on the choice of the sequence 
{a,}, as well as on the nature of F(x). For any given F(x) there doubtless exist 
more efficient estimators of 6 than any of the type {z,} defined by (45), but 
‘z,} has the advantage of being distribution-free. 

In some applications it is more convenient to make a group of r observations 
at the same level before proceeding to the next level. The nth group of observa- 
tions will then be 


(49) Vin~i)rat > °*°* » Yar; 


using the notation (44). Let 7, = arithmetic mean of the values (49). Then 
setting 


(50) Ln41 — In = An(a — Yn), 


we have M(x) = F(x) as before, and hence (48) continues to hold. 

The possibility of using a convergent sequential process in this problem was 
first mentioned by T. W. Anderson, P. J. McCarthy, and J. W. Tukey in the 
Naval Ordnance Report No. 65-46(1946), p. 99. 


5. Amore general regression problem. It is clear that the problem of Section 4 
is a special case of a more general regression problem. In fact, using the notation 
of Section 2, consider any random variable Y which is associated with an observ 
able value x in such a way that the conditional distribution function of Y for 
fixed x is H(y | x); the function M(z) is then the regression of Y on z. 

The usual regression analysis assumes that M(x) is of known form with 
unknown parameters, say 


(51) U(x) = Bo + Biz, 
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and deals with the estimation of one or both of the parameters 8; on the basis of 
observations y; , ¥2, *** , Yn corresponding to observed values x , %2, -** , 2n- 


The method of least squares, for example, yields the estimators b; which minimize 
the expression 


(52) 2 (yi — [Bo + Biz)”. 


Instead of trying to estimate the parameters 8; of M(x) under the assumption 
that M (zx) is a linear function of z, we may try to estimate the value @ such that 
M(@) = a, where a is given, without any assumption about the form of M(z). 
If we assume only that H(y | x) satisfies the hypotheses of Theorem 2 then the 
sequence of estimators {z,} of 6 defined by (7) will at least be consistent. This 
indicates that a distribution-free sequential system of making observations, 
such as that given by (7), is worth investigating from the practical point of 
view in regression problems. 

One of us is investigating the properties of this and other sequential designs 
as a graduate student; the senior author is responsible for the convergence 
proof in Section 3. 





SOME BOUNDED SIGNIFICANCE LEVEL PROPERTIES OF THE EQUAL- 
TAIL SIGN TEST 


By Joun E. WAtsH 
The Rand Corporation’ 


1. Summary. In addition to being easily applied and reasonably efficient 
for small samples, the equal-tail sign procedure for testing hypotheses about, or 
setting confidence intervals for, the population median is valid under very general 
conditions. (For brevity, the equal-tail sign procedure will be referred to as 
Procedure E.) Rarely, if ever, however, are these conditions exactly satisfied in 
practice. Thus the actual significance level or confidence coefficient for Procedure 
E is only an approximation to the standard value (which holds when the condi- 
tions are satisfied). Undoubtedly the equal-tail sign procedure is used in many 
cases when these conditions are only roughly approximated. The purpose of this 
paper is to investigate under what conditions Procedure E has significance 
levels and confidence coefficients which are satisfactory approximations to the 
standard values. It is found that the approximation is reasonably good for a wide 
variety of situations if the number of observations is not large. Thus, as far as 
errors of Type I are concerned, Procedure E is a sufficiently close approximation 
for many practical cases. This significance level stability, combined with its 
other favorable properties, suggests that the equal-tail sign procedure be seriously 
considered for application when an inference is to be made from a small numbe: 
of observations to the population median. 

2. Introduction and discussion. Let us consider testing whether the population 
median yw equals a given hypothetical value wo for situations where alternative 
values of the median greater than yo are to receive the same emphasis as those less 
than this value. The equal-tail sign test represents a solution to this problem 
which is of great practical utility. The computation required for the application of 
an equal-tail sign test is small. The efficiency of these tests is reasonably high for 
small samples from normal populations (see [1]). Also the equal-tail sign test is 
valid under very general conditions. Sufficient cond’tions are that the observa- 
tions used for the test are statistically independent and from populations which 
satisfy 

(i) the populations have a common median value y, and 

(ii) no population has a discrete amount of probability concentrated at 

wo 3 1.e., Pr(x = uo) = O for each population. 
Here it should be emphasized that u is not necessarily unique; there may be an 
entire interval of points which satisfy (i). 

Situations where u is not unique but represents a set of points cause little 

difficulty if suitably interpreted. An equal-tail sign test of the null hypothesis 


1 The author is now with the U. S. Bureau of the Census, Washington 25, D. C. 
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“ = yo is merely a method of deciding whether ys is a point having the property 
that Pr(x < wo) = Pr(x > wo) = $ for each population. The location of uo among 
the 50% points of a population is usually not of importance. Thus the null 
hypothesis » = po has the interpretation po in yu. 

Let the n independent observations on which a test is based be denoted by 
Yi, ***, Yn- Subtract wo from each of these observations. Then n nonzero 
numbers will be obtained (the probability of the number zero occurring is zero). 
The equal-tail sign test for the median can be expressed in terms of the signs 
(+ or —) of these numbers. Let p be the number of positive signs (whence 
n — pis the number of negative signs). The equal-tail sign test for comparing » 
with the given hypothetical value yo is Accept u # wo tf either p = iorp Sn — 2, 
where t > (n + 1)/2. An equivalent way to state this test is in terms of order 
statistics. Let 2; , --- , 2, represent the values of y; , --- , y, arranged in increas- 
ing order of magnitude. Using order statistics, the equal-tail sign test for the 
median is 

Test 1. Accept u # po if either 2; < po OF Ln41-1 > wo, where i > (n + 1)/2. 

The significance level of Test 1 is a function of i and n which has the value 


1\*"*.2 n! 
(1) Pr(x; < p) + Pr(tayi-i > w) = ( ) —— 


si S!(n — 8)! 


2 


when conditions (i) and (ii) hold. 
The statement of the equal-tail sign test in terms of order statistics is con- 
venient because equal-tail confidence intervals for u can also be derived. Since 


t > (n+ 1)/2, it follows from (1) that 


(Zn41-¢ » Zi) 


is an equal-tail confidence interval for « with confidence coefficient 
re a-l nn n! 
, 8) » sin — 8)! 
if (i) and (ii) hold. 


When conditions (i) and (ii) are not necessarily satisfied, Test 1 is no longer 
exact. Its significance level may differ substantially from the value of (1). The 
null hypothesis may not be expressible in the form u = yo. In many cases, 
however, the equal-tail sign test furnishes a reasonably close approximation to a 
fairly large class of tests. This approximation is close in the sense that each test 
of the class has a significance level which is near the value of (1) when conditions 
(i) and (ii) are even roughly satisfied. The principal purpose of this paper is to 
define this class of tests and investigate their significance level properties. 

First, let us consider the form and properties of the null hypotheses for the 
class of tests to be investigated. Since condition (i) is not necessarily satisfied, the 
null hypothesis can no longer be expressed in the form u = yo . Let u; represent 
the median value (or set of median values) for the population from which the 
observation y; was drawn (j = 1, --- , n). For each test of the class, the null 
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hypothesis is required to be some function of uw, --- , wu» which reduces to 
& = wo When condition (i) is satisfied. Since these null hypotheses represent 
generalizations of the null hypothesis for the sign test (u = wo), they will be 
referred to as generalized null hypotheses. Hence the generalized null hypotheses 
considered will be of the form 


uy 218 contained in h(uy, +++ , wn), 


where the set function A is restricted so that it is contained in the set of 50 % 
points common to all populations (denoted by u) when condition (i) holds. 
If h is not unique, the generalized null hypothesis has the interpretation jo 
in Ah. 

The function h is also restricted so that it is nearly the same as 4 when condition 
(i) is approximately satisfied. Stated in another way, the function chosen for kh 
should not be sensitive to condition (i); i.e., a moderate deviation from the 
existence of a common median value should not have an appreciable effect on h. 
For example, let 4: , --- , uw, be unique and large. Then the function 


n—l 


l< 2 
ye it ts oe De (wisn — my) 
1 1 


would not be suitable for use as h even though it reduces to u when all the yy; 
have the value u. 

Now let us define the class of tests which are investigated in this paper. All 
tests of the class reduce to the equal-tail sign test when conditions (i) and (ii) 
hold. Consequently, each test of the class will be referred to as a generalized 
test. A generalized test is defined by 

Test 2. Accept that uo is outside of h if either x; < po OF Xnqi-i > wo, where 
t> (n+ 1)/2. 

The significance level of this test equals 


(2) Pr(xi < wo| wo inh) + Pr(ansi-i > wo | wo in h). 


The value of (2) is not completely determined by 7 and n. It also depends on 
many other factors such as the populations from which the observations were 
drawn and the value of yo . In spite of this inexactness, the value of (2) is usually 
rather closely fixed if h is a reasonable type of function and conditions (i) and (ii) 
are even roughly satisfied. The statement of Test 2 defines a class of tests rather 
than a single test because of the possible choices for the function h. 

It should be pointed out that Test 2 does not necessarily have equal tails. That 
is, the value of Pr(x; < yo | wo in h) is not necessarily equal to the value of 
Pr(tnyi-i: > po! uo in h). In extreme cases, Test 2 might even be one-sided. 

The main problem of the paper is to show that in practice the value of (1) is 
usually a close approximation to the value of (2). This, of course, is not always 
true. For example, consider the case where some or all of the populations from 
which the observations were drawn have a large proportion of their probability 
concentrated at or near the median. Then the value of (2) may differ greatly 
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from that of (1) even though conditions (i) and (ii) are very nearly satisfied. 
For populations of the type ordinarily encountered in practice and a reasonable 
choice of h, however, the value of (1) is usually near that of (2) even when 
conditions (i) and (ii) are only roughly satisfied. This is proved by obtaining 
upper and lower bounds for (2) as functions of n, i and a quantity 8. Here @ is 
defined to be the greater of 


max Pr(ys < wo | uo inh) — 3], max | Pr(y; > wo| mo inh) — 5 

3 i 3 “ 
If 8 = 0, the significance level of Test 2 equals that of Test 1. If 8 is small, the 
value of (2) is very near that of (1). Table 1 contains upper and lower bounds for 
the significance level of Test 2 for 8 = .02, .05, .08, .10, .15, .20, and n S 15. If 
the populations are continuous (or very nearly so) at yo , the value of the lower 
bound is noticeably increased (see Table 2). Thus, for n S 15, the value of (1) 
does not differ greatly from that of (2) even for 8 moderately large. A value of 8 
as large as .05 would seem unusual for the ordinary type of practical situation 
where there is reason to believe that conditions (i) and (ii) are approximately 
satisfied. 

Let us consider the practical implications of the fact that the equal-tail 
sign test approximates Test 2 in the sense of significance level. Suppose the 
experimenter recognizes the possibility that conditions (i) and (ii) may not 
hold for his experiment. He then selects the function h(u:, --- , un) which is of 
principal interest to him and uses Test 2. In this manner he obtains an accurate 
test of the null hypothesis in which he is interested. On the other hand, suppose 
that the experimenter applies the equal-tail sign test without considering the 
possibility that conditions (i) and (ii) may be violated. The results of this paper 
show that he is protected if the appropriate function h (which he would have 
chosen) and the populations from which the observations were drawn are of a 
reasonable nature. Then he is testing the appropriate null hypothesis at ap- 
proximately the specified significance level even though he may not think of the 
test in this light. 

Since for the case of a sample from a normal population the efficiency of the 
equal-tail sign test decreases as n increases, much of the investigation is limited 
to tests based on 15 or fewer observations. Table 1 contains a list of the tests 
investigated along with their efficiency for normality. The efficiency of a sig- 
nificance test (more precisely, the power efficiency) is defined in [1]. Intuitively 
the efficiency of a test measures the percentage of available information per 
observation which is utilized by that test. 

The equal-tail sign test for the median may be useful’ for situations where 
there is not much information available concerning properties of the populations 
from which the observations were taken. Due to the extremely general conditions 
under which its significance level is approximately determined, this test can be 
used in cases where more specialized tests are not necessarily applicable. 

Approximate confidence intervals for h(u , --- , un) can be obtained from Test 


- 
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2. For populations of the type usually encountered in practice and a reasonable 
function h, 


(Za41~< » 24) 


is a confidence interval for A with confidence coefficient approximately equal to 
unity minus the value of (1). 

The material presented in this paper is limited to investigation of Type I 
errors of the equal-tail sign test when the conditions on which it is based are 
generalized. Due to the extremely general si‘uations considered, an investigation 
of Type II errors was not feasible. However, the results obtained for the particular 
case of a sample from a normal population indicate that the efficiency of the 
equal-tail sign test is reasonably high for most situations if the number of 
observations is small. 

3. Outline of results. This section contains a statement of the main results 
of the paper. The proofs of these statements are given in Section 4. 

The method followed in obtaining bounds for (2) consists in fixing n, 1, 8 and 
then finding the largest and smallest values of (2) possible on the basis of these 
and any additional restrictions. Thus the bounds represent the worst possible 
situations for the given restrictions. For most situations, the value of (2) would 
likely be nowhere near the values of the bounds. Consequently, for most cases 
the value of (1) will be much nearer (2) than is indicated by the upper and lower 
limits in the tables. 

Let us consider the general case where both conditions (i) and (ii) could be 
violated. Values of upper and lower bounds for the significance level of Test 2 
as functions of n, 7, and 8 are given by 


n! 


=i s!(n — 8)! 


] ea aitine 1 Ss n—s 
(G+) (5-8) +(5-8) (5 +6) |; 
lower bound = 2 : oa ( = 8) (5 + ) . : 


Thus, if 8 = 0 the value of (2) equals (1) while if 8 is small the value of (2) 
is very nearly equal to (1). Table 1 contains values of these upper and lower 
bounds for the tests considered. A visual example of how the bounds given by (3) 
vary as functions of 8 for fixed n and 7 is given by Figure 1, which contains a 
plot of these bounds for the case n = 9,7 == 8. If 8 — 4, the upper bound — 1 
and the lower bound — 0. 

A case of practical interest is that where condition (i) is not violated to any 
appreciable extent; i.e., none of the populations has a noticeable amount of 
probability concentrated at uo. Then the upper bound given in (3) still holds 
but the lower bound is greatly improved. Table 2 contains a list of some numerical 
values for this lower bound. These values are only slightly less than the value of 


upper bound 
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(1) except for large values of 8. The dotted curve in Figure 1 represents a plot 
of this lower bound as a function of 6 for the case n = 9,7 = 8. 

In all the above results, the n observations on which tests are based were 
assumed to be independent. Although no analysis will be made for cases in 


TABLE 2 


Lower bounds for the significance level of Test 2 when populations are continuous at po 


Test Value Lower bound for significance level of Test 2 
of (1) 


Accept mo outside A if either 6 = 9) |g = .o2\g = .05\8 = .08|¢ = 10/8 = .15|8 = .20|8 = .30|8 = .40 
| | | | i | } 
1250 |. 1246). 1225 1187] 1152} . 1035) .0882} .0512) .0162 


.0625 | .0623).0613) .0593) .0576) .0519 0441! .0256! .0081 


0312, 0311] 0303} .0289} .0276) .0235] .0185] .0082| .0015 


.0156 | 0156) .0152| .0145} .0138) .0118| .0093} .0041) .0007 
.1250 |. 1247]. 1231) 1202! 1175} . 1082] .0953) .0604] .0214 
. > ‘ o | 


| 


0078 .0078) .0075) .0070) .0066) .0043) .0039) .0013) .0001 
.0704 | 0701] .0688) .0663) .0641| .0567) .0469) .0236| .0049 
c= 
0039 | .0039) .0038) .0035) .0033} .0027} .0019) 0007] .0001 
0390 .0389) .0381) .0367! .0354) .0310) .0254! .0125) .0025 


0214 |.0214|.0208! .0198) .0188) .0158) .0121! .0047| .0005 
1094 |.1091 . 1075). 1044 . 1016} .0919 0785) .0436) .0104 


aia : i : 
-O117 | .0117).0113) .0107| .0102) .0085) .0065) .0024! .0003 
-0654 | 0652) .0641) .0621)| .0602) .0538) .0453) .0241) .0055 


- aan 


0386 | .0384].0376! .0360! .0346) .0298) .0237! .0102! .0014 
| i 


abn _|———| ; 

0224 | .0224).0218).0208) .0200! .0170} .0133) .0055) .0007 

0924 | .0920}.0907! .OS82! .0859! .0779) .0669! .0381| .0096 
| | } i 


.0130 | 0129) .0125) .0118) .0112) .0092) .0068) .0022) .0002 
i - - | erm! ~=! -o - 
-0574 0572! .0561) .0540! .0522) .0459) .0375) .0176) .0027 


0074 .0073} .0071| .0067 .0063) .0051| .0037| .0012 0001 
Zi2 < po OF 0350 | .0350| .0343] .0329) .0317| 0275) .0221| .0099| .0015 
i ' ' 


which the observations are not independent, examination of the significance 
level expression (2) for Test 2 indicates that the value of (2) will often be ap- 
proximately equal to (1) when the observations are mildly dependent. This 
follows from the intuitive observation that in many cases dependence changes 
Pr(x; < po | wo inh) and Pr(aa4i-; > wo | wo in h) in such a way that one prob- 
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ability expression is increased while the other is decreased; consequently the 
value of (2) tends to remain near that of (1). 


4. Derivations. The purpose of this section is to present derivations of the 
results stated in the preceding sections. 





o 
Zz 
> 
°o 
@ 
u t 
c 
Ww 
2 
4 
a 
> 





VALUE OF £ 


BOUNDS OF (2) FOR n=9,i=8 
Fig. 1. 


The expressions for (1) and (2) follow from i > (n + 1)/2, conditions (i) and 
(ii), and elementary probability considerations. Consider relations (3). Let 


Pr(y; < wo | wo inh) = 4 + a;, Pr(y; = uo | wo in h) = €j; 


Pr(y; > wo| wo inh) = $+ 7; (j 1, --* , %). 


ii) 
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| ) + (1 =~ ¥ (1 
Pr(xs < wo\ wo inh) = [] (5 + as) 2 a > | I ( 


=! rel j1>°°:>jeml 


: + /1 
Pr(Xnyi—i > wo} Mo Nh) = II (5 i a “) 


y= l 


+E S [MW (b--+)|[ Mbt en+e)], 


rel ji>--: k=l 


where the notation II’ denotes the product over those values of 7 (j = 1, --- , n) 
which are different from j, , --- , j,. If i = n, each double summation in (4) is 
taken to be zero. 

\xamination of (4) shows that (2) can be written in the form 


Fie 5 °° 5 OS ay ** 5 Oh Stes *** 5 
— 59, °** > 1, *** y Ejay €jti, 
where 
g(a , ++ 5 Gn 5 Gas *°* » Oped Opn, *** Oe) SO 


for each value of 7. Thus, since setting e; = 0 places no additional restrictions on 
the possible values of the a’s and the other e’s, to obtain the maximum value for 
(2) all the e’s should be zero. Now consider (2) with all the e’s equal to zero. 
It can be written in the form 


(5) u(a , “** 5 Qj-1, Aji, 9+ an) + ala, 85) Qj, Bjery °° * y An) 


for each value of j. Since —8 S a; S 8 for all j, the maximum value of (2) is 
obtained when the a; are restricted to be of the form 


(6) aj = 758 G = 1, ---,), 


where each 7; equals either +1 or —1. Assume that an arbitrary but fixed choice 
has been made for the 7; . Then (4) shows that (2) is a polynomial in @ which 
is an even function. Consider the coefficient of an arbitrary even power of @ in this 
polynomial. Examination shows that this coefficient is maximum (algebraically) 
for the case where all the 7; are chosen to have the same value. Hence (2) is 
maximum when 


(7) e; = 0, aj=B6 (j = 1,---,n). 


Thus the upper bound for the significance level of Test 2 is that given in (3). 
Now consider the lower bound for (2). Examination of (4) shows that 
Pr(z; < wo | wo in h) is minimum when a; = —8 (j = 1, --- , n). Similarly, 
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Pr(an41-1 > wo | wo in A) is minimum when a; + ¢; = 8 (j = 1, --: , n). Thus (2) 
is minimum when 


a; = -8 (Qj fae D, 


Substitution of these values into (4) verifies the expression given in (3) for the 
lower bound. 

If the populations for Test 2 all satisfy condition (ii), ¢; = 0 (7 = 1, --- , n). 
From (7), the upper bound of (2) given in (3) is unchanged for this case. The 
lower bound, however, can be noticeably larger than the value stated in (3). 
Since for each value of j (j = 1, --- , n), the value of (2) can be expressed in 
the form (5), the lower bound of the significance level of Test 2 is equal to the 
minimum value which can be obtained for (2) when the a; are restricted to be 
of the form (6) and the e; have the value zero. As (2) is invariant with respect to 
permutations of a, , --- , a, , the problem of obtaining the lower bound of the 
significance level of Test 2 is reduced to that of determining the number m 
of the ; which equal +1 when the resulting value of (2) is minimum. Since 
the lower bound for the significance level of Test 2 is only required for n S 15 
and i = n — 3 (see Table 2), an analytical method of determining the value of 
m which minimizes (2) will not be developed; the values for the lower bounds 
listed in Table 2 were obtained by substituting numerical values for m and 
computing the resulting values of (2). For example, if 1 = n and m of the 
a; = +8 while the remaining a; equal —£, the value of (2) is 


(4 + 6)"(4 — B)™™ + (3 — 8)"(4 + B)””. 


If i < n, the expressions become much more complicated and will not be given 
here. 
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THE EXACT DISTRIBUTION OF THE EXTREMAL QUOTIENT 


By E. J. GumMBet anp L. H. Hersacu 


New York City and Columbia University 


0. Problem. The only quotients considered up to now are those of two observa- 
tions taken from different distributions. Instead of these statistics, we consider 
the quotient Q of the extremes (henceforth called the extremal quotient) for 
n > 2 independent observations taken from the same distribution. This quotient 
of the extremes has sometimes been used by climatologists [3]. Since it is obviously 
not affected by changes of scale, its use may be of interest in cases where the 
scale plays no role. The sensitivity of the extremal quotient to changes in origin 
is brought out by consideration of uniform distributions where the extremal 
quotient for a nonnegative variate has just the opposite qualities of the extremal 
quotient taken from a nonpositive variate. 

The asymptotic distribution of the extremal quotient was given in a previous 
paper [1]. However, the exact distribution of this statistic has never been studied 
before.” 

1. The distribution. Let f(x) and F(x) be the density and cumulative proba- 
bility function of a variate X where —w, < X < we. Let X,, be the largest and 
X;, the smallest value in a sample of size n (n > 2). Then the extremal quotient 
is Q = X,/X,. The exact cumulative probability function H(q) will be given in 
terms of pseudo probability functions 


@(q) = Pri 1<Q<qi, X>¢ 


) 
(1.1) @.(q) = Pr{-1 Q < q}, X ( 
#:(q) = Pr} 0<Q< q}, ( 


xX, >0, 


), 
). 


These would be cumulative probability functions of Q if the extremal quotient 
were restricted to the quadrant indicated by the subscript (see Figure 1). In 
general the cumulative probability function is 


(#2(q), 
H(q) — s @,(0) + $;(q), 
®,(0) + @;(1) + ,(q), 


Integrating the joint density of the extremes, 
(1.3) w(x, %) = n(n — 1)f(x,)[F(a,) — F(2)}’ ‘f(a, \. 
1 The authors wish to take this opportunity to thank Mr. J. A. Greenwood, who con 


structively read a first version of this paper. 


418 





EXTREMAL QUOTIENT 


quccenemnnenamensmeapnmmsanamaneni> 
x 


[S] ZONE OF ZERO PROBABILITY DENSITY 


D3 LE. WHERE w(x,,x,) =O 


7] AREA OVER WHICH PROBABILITY 
J DENSITY IS TO BE INTEGRATED 
IN CASE i =1,2,3 


Fig. 1. Zones of integration 


over the shaded areas in the proper quadrants of Figure 1, we obtain %,(q), 
$.(q), ®;(q), which, when substituted in (1.2), yield after some simplification 


0 
as init / (F(gx) — F(z)" fiz) dx, 9@<@ <0, 


w2/¢ 


(14) Hq) = ye -— 1 [ [F(qx) — F(x)]"~ f(x )dz, 


@1 


of | Fe _P (2)|" fla) dr, 
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where 


(1.5) cq = E —F (“)| — f1 — F(O))", oe = 1— fl — FO)’, 


and w > 0, we > 0. 

It is to be noted that H(q) = %,(g) or &3(q) according as the random variable 
X is always positive or always negative. Thus the pseudo probability functions 
#,(q) and #3(¢) may be real. Since 


(1.6) (0) = 1 — F"(O) — [1 — F)]” 


can never be unity for finite n, d.(g) can never be a cumulative probability 
function. But 4.(0) — 1 asn — ~~ if X can take positive and negative values. 


> 
“ 


Fic. 2. Area over which probability density w(z;, 2,) is to be integrated if w:2 > — w. 


Thus, if 7 is sufficiently large, the extremal quotient may be treated as negative, 
as was done in [1]. The speed with which the positive part of the distribution 
of Q shrinks with even fairly small values of n may be seen in Figures 5 and 6. 
For any initial distribution and sample size an indication of the error committed 
by using H(q) = #.(q) when w; > 0, w. > 0 may be found by seeing how close 
the value of (1.6) is to unity. 

If w, is negative, and w. > —w,, then w,/—w, > Q > 1 (as in Figure 2), 
and the probability function becomes 


wo/q( we ) 
H(q) =1—n [ {(n — » | [F(tx) — F(as))"" fen) dtnp f(a) dz 


(1.7) 


[ +f We . — = . . a~l os _% 
= . - F( ‘| + nf |F(gx) — F(x)|"" f(x) dz. 


q/ 1 
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Similarly, if w, is negative, and —w, < w: < 0, then w./—w, < Q < 1 and the 
probability function is symmetrical to the previous case. These special cases 


cover, for example, uniform distributions in the intervals 1 < x < 2 and 
—-2<2z2<¢ -1. 


The extremal quotients Q, and Q, for two variates Xq) and Xq which are 
unlimited in both directions and possess mutually symmetrical distributions, 
have probability functions H;(q:) and H2(q:) which are linked by 


(1.8) H}(1/q:) = 1 — Hyi(q). 
2. Special cases. For a symmetrical limited distribution where w, = w:; = w 


(say) and | X | < w we have 


F(-#) =0, Fw) =1, FO) =, 
and (1.5) becomes 
¢ = [1 — F@/g)"- 4)", a=1-@ 
For g = —1, 0, 1, the probabilities become 
(2.1) H(-1)=4-()", H@)=1-(@)"", Al) =1- @)”. 


Therefore the probable error about zero is unity, the median of the extremal 
quotient for a symmetrical distribution converges toward —1, and zero is the 
practical upper limit for large n. 

For symmetrical unlimited distributions, ¥(w/q) vanishes for gq < 0, and the 
probability consists of only two parts, namely 


' —(3)” —n f (F(x) — F(qz)|""f(z) dx, ql, 
(2.2) H(q) =4 


[i -@r tn I (F(z) — F(x/@\""fla) dz, q>1. 


To apply these methods, we consider first four cases of the uniform distribution 
which give quite unexpected results. By virtue of the scale invariance mentioned 
in Section 0, we set the length of the interval of variation equal to unity. The 
first two cases f(z) = 1 for0 < x < land —1 < z < 0 obtained from (1.4) are 
summarized in Table 1. The respective distributions of the extremal quotients 
for these very closely related distributions have characteristics which are di- 
ametrically opposed. The asymptotic values of the medians are mutually 
reciprocal, and the asymptotic distributions of the reduced quotients are the 
second and third asymptotic distributions of extreme values [2]. 

The two examples show that the extremal quotient is very sensitive to opera- 
tions like translation which, as a rule, have no influence on the distribution. 


Consider now a uniform distribution where zero is within the domain of 
variation of 2. 


(2.3) Fz@)=a+e2, f@=1, -m Stow, wtw=l. 
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Fic. 3. Densities h,(q) of the extremal quotient for the uniform distribution 0 <z < 1. 
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Density of probability hny(q) 


Extrema! quotient q 
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Fic. 4. Densities ha(q) of the extremal quotient Q for the uniform distribution 
-l1<2z< 0. 
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TABLE 1 


O0<2z<¢!1 


(1 — 1/q)" 1 
(n — 1) (1 — 1/q)"*q"? 
q2!1 
Fig. 3 
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The general formulas (1.4) and (1.5) lead, after trivial calculations, to the 
probability of the extremal quotient 


a (i a ” ie li ms 1, 


" 


ee (1 <a q)” ‘wr, 


n [1 ot a (i Se or. 


H(0) = 1 — w2 — oI}, H(1) = 1 — @}. 
The density corresponding to this probability distribution is drawn for w, = } 
we = Zand n = 2 to 5 in Figure 5. 
As an example of a symmetrical limited distribution, we put w, = w, = }. 
These densities are drawn for n = 2, 3, 4 in Figure 6. The shapes of these two 
series are, of course, completely unexpected. 
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Fig. 6. Densities of the extremal quotient for the uniform distribution —4 < z < }. 





EXTREMAL QUOTIENT 425 


We now show how the methods have to be altered to cover the case where a 
symmetrical unlimited distribution cannot be regarded as a single function, 
as for instance in the so-called first Laplacean distribution, where the formulas 
for the two symmetrical branches differ, 


f(z) = fi(x) = }e’," F(z) F(z) 


(2.5) —z , 7 
f(x) = frlz) = 9e°, F(a) = Piz) = 


TABLE 2 
Extremal quotients fors n= 2 


Name | Initial por: fla) L Condition on x Density &(q) 
0 1/[2(1 — q)?*] 
0 1/(2(1 + q)?] 


Laplace 


E eppnemiiel 2/(1 + q)* 
2/(1 + q)? 


20(2k)qe! 


Gamma e-*z*"' /T(k) 
| ‘ r2(k) (1 + q)* 








1 
m(1 + q°) 


Normal 





—lo; 8 
|—ox <zr< w - of |-* <e<- 


deities "a — ) 


In formula (1.4), which is valid even if the functional form varies under the 
integral sign, we have to use f, and F, for positive values of x, and f; and F; 
for negative values of x. Accordingly we have 


11 - @)"— nf (Fg) — Fu) *A@) ae, <0, 


11 — @* +n [ | Feta) — Fe (7)f falx) dex, q>1. 


It is easily seen that the middle term holds for g > 0. Thus the probability 
function and density consist of only two branches which join at g = 0. 

The degenerate case, n = 2, is shown for different initial distributions in 
Table 2. 

If, as in the case of the Cauchy distribution, the initial distribution possesses 
no moments and does not vanish at x = 0, the density of Q becomes infinite 
at q = Oforn = 2. 


(2.6) H(q) = 1 — (4)" — nf [Fi(qz) — Fi(x)]""fi(a) dz, 0<q<1, 





426 . J. GUMBEL AND L. H. HERBACH 


On the whole, the theory leads to surprisingly complicated results even for the 
simplest distributions as long as the sample size is small. 
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THE ASYMPTOTIC DISTRIBUTION OF THE RANGE OF SUMS OF 
INDEPENDENT RANDOM VARIABLES 


By WriLu1aAM FELLER 
Princeton University 


Summary. The asymptotic distribution of the range and normalized range 
of the sum of n independent variables is derived using the theory of Brownian 
motion. 


1. Introduction. Let [X:] be a sequence of mutually independent random 
variables with a common distribution V(x), and suppose that H(X;,) = 0, 
Var(X;) = 1. Put S, = Xi, +-:-+ X, and let 


M,, = max (0, S,, S:,--- , Sp, 
(1.1) 


m, = min [0, S,, Se, --- , Sal. 
The random variable 
(1.2) R, = M,z — m, 


will be called the range of the cumulative sums S,, . 

In applications’ it is advantageous to modify this definition. One considers 
instead of the values of the sums S, their deviations from the straight line joining 
the origin to the point (n, S,). Thus we replace the random variables S, by 


(1.3) Sr = S, — kS,/n (b = 1. «>: .#) 


and define the corresponding variables M* , me , R* in analogy with (1.1) and 
(1.2). The variable R®, will be called the adjusted range of the cumulative sums S,, . 

The adjusted range has a greater sampling stability, but its main advantage is 
probably due to the fact that it eliminates the trend when E(X;) # 0, so that it 
can be used even when the means do not vanish. 

It is practically impossible to calculate the exact distribution of the ranges 
even for n = 3 and simple forms of the underlying distribution V(x). Now the 
sums S, are obviously asymptotically normally distributed, and therefore the 
asymptotic distribution of the ranges is independent of the form of V(z). It 
suffices accordingly to consider the case where the variables X, are normal. 
The sum S, can then be considered as the value at time ¢ = n of a continuously 
changing normal variable S(t) which is subject to a Bachelier-Wiener process 
(or ordinary diffusion). Since the sequence [S;| is a subsequence of the values 
assumed by S(t), the range FR, is certainly not smaller than the range at time 


1 Cf. in particular Hurst [4]. A surprising statistical phenomenon discovered by Hurst is 
discussed at the end of Section 2. The author is indebted to Mr. G. W. Alexander of the 
State Rivers and Water Supply Commission, Melbourne, for drawing his attention to Hurst’s 
paper and the interesting statistical problems connected with it. 
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t = n of the variable S(t), and it is clear that for large n the two ranges will be 
practically the same. 

In Section 3 we shall find the exact distribution of the range R(t) of the continuous 
variable S(t) [ef. (3.7)]. One gets in particular 


E(R(n)) = 2(2n/x)' = 1.5958 --- ni, 
(1.4) 
Var(R(n)) = 4n(log 2 — 2/r) = 0.2181 --- n. 


These quantities are, asymptotically, the mean and varance of the range R,, . 
. ' +. , . . 
For the adjusted range R, we have to introduce the corresponding continu- 
ously changing variable 


(1.5) S*(t) = S() — tS(T)/T wow <f < Fi. 


This variable appears more complicated than S(t). Fortunately the stochastic 
process defined by (1.5) happens to be equivalent to a process which has been 
studied in an exceedingly elegant and simple manner by Doob in connection with 
his heuristic approach to the Kolmogorov-Smirnoy theorems. Using Doob’s 
results it is easy to obtain the exact distribution of the adjusted range R*(7') for 
the continuously changing variable S(t). It is given in (4.3) and represents the 
desired asymptotic distribution of the adjusted range R® for n = T. One gets in 
particular 


E(R*(T)) = (Txr/2)' = 1.2533 --- T’, 


Tv 
») 


Var(R*(T)) = (= 
6 


) T = 0.07414 --- 


2. Discussion. A comparison of (1.4) and (1.6) shows that the adjusted range 
has the advantage of greater sampling stability. 

In order to get an idea about the goodness of the approximations (1.4) and 
(1.6) we compare them with the exact values in the perhaps most unfavorable 
case, namely where each variable X; assumes only the values + 1, each with 
probability 4. For n = 6, 10, 12 we get 

Exact value Approximation (14 
(Rs) 3.0625 3.909: -- 
Var(Re) 1.18360 1.309. -- 
E (Ry) $.1523--- 5.046. - - 
Var(Ryp) 2.0872: - - 2.181--- 
E(Ry) 4.6377: -- 5.528: -- 


Var(Ry 2.545 --- 2.617--- 
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For the adjusted ranges (and the same Bernoulli variables) the corresponding 
figures are 
Ezact value Approzimation (1.6) 


E(Rf) 2 3.070: - - 
Var(R§) 0.4396: - - 0.445--- 
E(Riy) 2.954--- 3.963--- 
Var(Rto) 0.5822: - - 0.7414--- 


Considering the smallness of our n and the fact that the assumed distribution 
of the X, is most unfavorable for our approximation, the above results appear 
surprisingly good. They also bear out the expectation that the ranges of the 
sums S, should be smaller than those of the corresponding continuously varying 
variables S(t). 

If the model of cumulative sums of independent random variables applies to a 
particular type of empirical phenomena, then the observed ranges should, on the 
average, increase with the square root of the length 7 of the observational period. 
Now there is available a huge body of statistics concerning annual water levels 
of rivers and lakes all over the world. It has naturally been assumed that such 
levels could reasonably be treated as the cumulative effect of sums of random 
variables, but in an interesting paper [4] H. E. Hurst discovered puzzling syste- 
matic departures. In fact, Hurst has collected an impressively large statistical 
material relating to water levels and other phenomena which seems to bear out 


the contention that the observed adjusted ranges do not increase, as expected, like 
the square root of the observational period T, but like a higher power T*. The most 
surprising feature is the stability of the observed values of the exponent c: 
it varies only from 0.69 to 0.80, with a mean of 0.729 and standard deviation 
0.092. Within the several separate groups of phenonema the stability of c is even 


greater. Hurst himself has not attempted an explanation of his interesting 
discovery. 


It is conceivable that the phenomenon can be explained probabilistically, 
starting from the assumption that the variables X; are not independent, but that 
X41 depends only on the actual value of S, . For example, a high lake level 
creates additional outlets for the outflow and this in practice means a restoring 
force towards the average size. Mathematically this would require treating the 
variables X, as a Markov process. In theory the method presented in this paper 
applies to this more general case, but the simple ordinary diffusion equation 
would have to be replaced by a general Fokker-Planck equation, and the solution 
of the corresponding boundary value problem is not explicitly known. We are 
here confronted with a problem which is interesting from both a statistical and a 
mathematical point of view. 


3. The range. We have to deal with the variable S(t) of a Bachelier-Wiener 
process; this means that S(t) is a normal variable with mean 0 and variance ¢ 
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(¢ > 0), and the increment S(t + h) — S(t) is a normal variable with mean 0 
and variance h which is independent of S(t) (and the values S(r) for r < 2). 
For fixed u > 0 and v > O we require the probability F(7; u, v) of the event 


(3.1) M(T) <0, m(T) > —u, 


where M(7T) > 0 and m(T) < 0 denote, respectively, the maximum and mini- 
mum of S(t) for0 < ¢ < T'.. The corresponding probability density is given by the 
mixed derivative 


(3.2) f(T; u, v) = Fy(T; u, v), 
and it is easily seen that the density function 6(7; r) of the range R(T’) = 
M(T) + | m(T) | is 
(3.3) a(T; r) = [ f(T; u,r — u) du. 
0 


To calculate F(T; u, v) we start from the density function w(t, x; u, v) of the 
event that simultaneously S(t) = 2, M(t) < v, and m(t) = — u. By the definition 
of these functions we have 


(3.4) F(T; u,v) = / w(T’, x; u, v) dz, 


so that the required density 6(7'; r) follows from w(T’, x; u, v) by routine cal- 
culations. Now it is easily seen’ that w(t, x; u, v) is simply the fundamental solu- 
tion of =he ordinary diffusion equation w, = }w,, for the interval -—u < x < v 
with the boundary conditions w(t, x; u, v) = 0 when x = —uorz = v. One 
gets by the so-called method of images* 


; = 2k Q2kv — x 
Hw(t,2; u,v) = do 6( ai ") 


k=-—m i 


a = 6? + 2k — lv + *\, 


—e th 


(3.5) 


where ¢(x) stands for the normal density function with zero mean and unit 
variance. Carrying out the indicated operations we find finally for the density 
function of the range R(t) 


(3.6) b(t; r) = 8 > (-1)* "ko (5). 
k=l t 


In this form it is not even obvious that the function is positive, and it is 
readily seen that the mean can not be obtained by termwise integration of 


* The reasoning is substantially the’same as in the case of discrete random walks (cf. [3], 
chap. 14). 

* Cf. problem 5 on p. 304 of [3]. Formula (3.5) can be derived from the formula given there 
by the passage to the limit described in section 6 of chapter 14. Cf. also [5], p. 213. 
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(3.6). Fortunately this function is closely related to the distribution function 
L(z) which oceurs in the Kolmogorov-Smirnov theorem on empirical distri- 


bution functions. The distribution function L(z) can be written in two equivalent 
forms‘ 


L{z) = 1 — 2 > (—1)* "exp (—2k 2’) 


k=l 
= (2x)'27 » 3 exp (— (2k — 1)’x"/82’). 
k=1 
Clearly 
(3.8) &(t; r) = (2/n)'r* L'(r/(20)). 


The second representation in (3.7) shows that z“L(z) — 0 as z— 0 for any a, 
and hence an integration by parts shows that 


(3.9) [ 5(t; r) dr = 8e * D> (2k — 1)? = 1. 
0 


bank 


A similar procedure then leads to the formulas (1.4). 


4. The adjusted range. We have now to find the range of the continuously chang- 
ing variable defined by (1.5). It is clear that S*(¢) is normally distributed with 
meanQ, variance ¢(7T' — t), and Cov[S*(s), S*(t)] = s(T — t)/T, for0 <s <t<T. 
Thus the stochastic process defined by (1.5) is, for the particular value T = 1, 
the process studied by Doob [1]. According to Doob a simple transformation 
permits one to reduce (1.5) to the ordinary Bachelier-Wiener process with the 
interval 0 < ¢ < T going over into the entire interval 0 < ¢ < ~. This actually 
simplifies matters inasmuch as the probabilities corresponding to (3.4) and 
(3.5) are no longer time dependent, so that the preceding boundary value 
problem for a partial differential equation is replaced by a simpler functional 
equation. At any rate, Doob’s last equation furnishes us with the probability 
F(T; u, v) that S*(t) is for 0 < t < T contained in the interval (— u, v). We 
have’ 


F(T; u,v) = 1 + e(u + v) 


(4.1) _ = fe(ku + (k — 1)v) + e((k — lhu + kv) — e(ku + bv) 
k=l 


— e((k — 1l)hu + (k — 1)v)}, 


‘ Cf. formula (1.4) of [2], where however a factor 2 is missing in the exponent. 

5 Doob’s formula looks simpler than (4.1), but the rearrangement (4.1) was necessary to 
make it possible to perform the required differentiations and integrations in the routine 
manner. (In the original form each term of the series contains a singular probability dis- 


tribution along the axes, and formal manipulations lead into apparent contradictions. Also, 
Doob has 7 = 1.) 
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where we put for abbreviation 
(4.2) e(x) = exp (— 22°/T). 


Formula (4.1) corresponds to (3.4), and it remains to perform the calculations 
indicated in (3.2) and (3.3). In this way we get for the density function of the 
range R*(T) of the variable S*(t) 


8(T; r) = re'(r) + Do {2k(k — 1) [e’((k — Dr) — e'(kr)] 
(4.3) k= 


+ (k — 1)?re((k — 1)r) + kre’ (kr)}. 


To see how the moments are calculated note, for example, that 


“0 


[ Pe" dr=-3[ re) dr=6[ rer) dr = 37/2, 
0 0 


and therefore 


“4 7 o 9). 
I r°3(T; r) dr . f= lry. 2k 
0 2 2 imo ((k — 1)? 


: T 2 ke =r T/6. 


k=1 


(4.5) 


In this way formulas (1.6) are obtained. 
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THE MOMENT PROBLEM FOR UNIMODAL DISTRIBUTIONS 


By N. L. Jonnson anv C. A. RoGrers 
University College, London 


Summary. Certain inequalities are obtained for the moments of unimodal 
distributions. 


1. Introduction. Let n be one of the numbers 3, 5, 7, ---, or +. Leta 
real number yu, be given for each integer r with 1 < r < n. It is known ([lI], 


Chap. III, Sec. 8-12; [2]) that, if there is a (cumulative) distribution function 
F(x) such that 


(1) [ ¢aF@ =u, l<rc<n, 
then 

Me | 

Me+1 


= 0 


Me Me+i Ce Mas 


for all integers s with 2 < 2s < n. Conversely, it is known ([{1], loc. cit.; [2]) that if 
(2) is satisfied with strict inequality for all integers s with 2 < 2s < n, then 
there is a distribution function F(x) satisfying (1). 

We say that a distribution function F(z) is unimodal, with mode M, if, for all 
real numbers 21, --- , 2% satisfying 


(3) mM<eex<t Mca <x, 

we have 
F(3 {a1 + 2}) < F(a) + F(r)I, 
F(3{xs + x}) > HF (es) + F(a)). 


We prove the following theorem. 

THEOREM 1. Let n be one of the numbers 3, 5,7, --- , or +. Leta real number 
ur be given for each integer r with 1 < r < n. Then there is a unimodal distribution 
function F(x) with mode zero and with 


(4) 


(5) [ var@ =, l<r<n, 


if and only if there is a distribution function G(x) such that 


(6) [ ¥ aa@ = + 0m, 


433 





434 N. L. JOHNSON AND C. A. ROGERS 


By use of the special cases n = 3 and n = 5 of this theorem we obtain the 
following results. 

THEOREM 2. Let m, M and o be real numbers, o being positive. Then there will 
be a unimodal distribution function with mean m, mode M, and standard deviation o 
if and only if 


(7) (m — M)* < 3e’. 


THEOREM 3. Let 8; and Bz be real numbers, 8, being nonnegative. Then there 
will be a unimodal distribution function with first and second moment-ratios 8; and 
82, respectively, if and only if 


(8) 5B, — 9 > y(246;), 


where, for all real y, y(y) denotes the largest number x satisfying 


(9) 9x‘ — 2yz* — 36yx" + 36y*x + 36y* — 6y* = 0. 


It follows from Theorem 3 that a distribution cannot be unimodal if its 
(8; , 82) point falls in the region bounded by the §,-axis, the limiting line 
B2 — 8; — 1 = O, and the curve given by 


(10) 5B: — 9 = y(246;). 


This curve meets the #,-axis at the point (0, 9/5). As 8; increases, 6. decreases 
until the point (27/512, 27/16) is reached. Thereafter 8. increases with 6; and 
the curve is asymptotic to the line 


(11) 608. — 648, — 81 = 0. 
The curve is given parametrically by 


1084" ie os Bee 72q°(3q — 1) 


(0 <q< 1). 


2. Proof of Theorem 1. We first suppose that F(x) is a unimodal distribution 
function satisfying the conditions (5), and having mode zero. Then the conditions 
(4) are satisfied for all 7, , --- , 2 satisfying (3) with M = 0. Thus F(z) is a 
nondecreasing function which is convex for x < 0 and concave for x > 0. It 
follows that the one-sided differential coefficients F(z), Fi(x) exist for all 
nonzero values of zx, and are equal except possibly for an enumerable number of 
values of x (see [3]). We define a function f(z) by the equations 


f(0) = 0; 
f(z) = Fiz), 2 #0. 


Then f(x) is a nonnegative function which is nondecreasing for x < 0 and which 
is nonincreasing for z > 0. Further, if J is any closed bounded interval not 
containing the point x = 0, the incremental ratio {F(y) — F(x)}/(y — 2) is 
bounded for all distinct points x and y of J (see [3], pp. 91-96). Hence F(z) is 


(13) 
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absolutely continuous in J and thus is an indefinite integral of f(z) in J. Thus, 
if we write 
(ro, 2 <0, 
(14) j(z) = 4 FO), z=0, 
F(+0), z>0, 
we have 
(15) F(a) = | $0) dé + 5@) 


for all x. 
From the monotonicity properties of f(z) we have 


la Ye) < 2" ||, F@ ae | = 2 [.€ are 


so that by the convergence of the integrals in (5) 
(16) z"f(2) 90 as r> +o 


for all r < n. 
Now consider the function G(x) defined by 


(17) G(x) = F(x) — af(z). 

From (16), G(—z) — 0 and G(x) — 1 as x > +. Also from (15), 

(18) Gia) = [ {s@ — fl@)) de + 5@), 

so that G(~) is a nondecreasing function of x. Hence G(z) is a distribution 


function. 


Let r be an integer with 1 < r < n and let X and Y be positive numbers. 
Then 


[lv ac@ = [" 2 ar@ - [2a (as@} 
~ [iv ar@ - wyatt rf 290) az 


-@ +1) [iv ar@ - 9M) + (-n"K(-»), 


the integration by parts being justified since zf(x) is of bounded variation by 
(17), and the final step being justified by (15). Hence, using (16) and (5), 


(20) [vac = +0 [2 are) = + Ds, 


for all r with 1 < r < n. This proves the second assertion of the theorem. 
We note that, since F(x)/z is absolutely continuous in every closed bounded 
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interval not including the point x = 0, it follows by (17) that the function F(z) 

is given in terms of G(x) by 
( z 

“ate x | & G(é) dé, 


F(x) =< G(0), 


x [ EG) dé, 


or, on integration by parts, by 
[ (1 — 2") dG), 


F(z) =< GO), 


i-f a -2e) aG@, x 


\ 


We now prove the first assertion of the theorem. Suppose that there exists a 
distribution function G(x) with 


(23) [ fs) = + De, tr" 


Let F(x) be the corresponding function defined by the equation (22). Then it is 
clear that F(x) is a nondecreasing function of z and that F(—z) — 0 and F(x) > 1 
as x — +. Hence F(z) is a distribution function. Also, if a, h, b, k are any 
real numbers with 


a-h<ac<cat+h<O0<b-—-k<b< b+, 
then 


F(a + h) — 2F(a) + Fla — h) 
(24) 


a rath 
= / ¢ "(a — h — £) dG(é) + | "(¢ — a — h) dG(t) > 
a—h a 


and a similar argument shows that 
(25) F(b + k) — 2F(b) + F(b — k) < 0. 


Hence the conditions (4) are satisfied whenever 2, --- , 24 satisfy (3) with 
M = 0. Thus F(z) is a unimodal distribution function with mode at x = 0. 
Now, if x < 0, 


aaa [. (1 — at) dG) 
e tf (-F) in} age) 
l 10 (-&) aawo} dn; 
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and similarly, if z > 0, 


(27) F(x) = 1— [ { [ "et aco} dn. 


Consequently 


[¥ ar@) [ : { [ ‘ (—€) aaa} dx + [ - “4 [ “ acco} dx 
-[. ef z" az} aa + [ cdf r az} dG(®) 


= +07" | £dG® = uw, 


for all r with 1 < r < n. Thus F(z) is a unimodal distribution with mode zero, 
satisfying (5). This completes the proof of the theorem. 

By combining Theorem 1 with the results quoted in Section 1 we obtain the 
following 


Coro.uary. If there is a unimodal distribution function F(x) with mode zero 
and with 


.a 


(28) | x’ dF(2) = ur, l<r<n, 
the condition 
Qu en (s + 1) ue 


Sue “a (8 + 2)pe4 | 
>0 


(s+ Lu, (8 + 2)meur °°: (2s + 1)uos 


is satisfied for each integer s with 2 < 2s < n. Conversely, if (29) is satisfied with 
strict inequality for all integers s with 2 < 2s < n, then there is a unimodal distribu- 
tion function F(x) having mode zero and satisfying (28). 


3. Proof of Theorem 2. First suppose that there is a unimodal distribution 
function with mean m, mode M and standard deviation ¢. Then there is a uni- 
modal distribution function F(x) having mode zero and satisfying (5) with 


(30) mn = 3, wy = m— M, we = o + (m — M)’. 
So by the case n = 3 of the Corollary to Theorem 1 we have 
1 2(m — M) 


2(m — M) 30° + 3(m — My’ 
that is, 


(m — MY’ < 30’. 
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Now suppose that m, V/ and o are real numbers, with o > 0, satisfying (7). 
To prove the existence of a unimodal distribution with mean m, mode M and 
standard deviation o, it clearly suffices to prove the existence of a unimodal 
distribution function F(x) having mode zero and satisfying (5) when the condition 
(30) is satisfied. Hence, by Theorem 1 it suffices to prove the existence of a 
distribution function G(x) with 2(m — M) and 3c° + 3(m — M)* for its first 
and second moments, respectively. Since 


30° + 3(m — M)? > {2(m — M)}* 


by (7), the existence of such a function G(x) is clear. This proves the theorem. 


4. Proof of Theorem 3. We first state without proof an elementary algebraic 
lemma. 


Lemma. Let §;, 8: be real numbers with 8, > 0. Suppose that there is a real 
number 6 satisfying 
(31) 3-85 >0, 
(32) (3 — 5°)(58. — 9 — 46+/8,) > 168; 
then 
(33) 5B. — 9 > y(248,), 


where y(y) ts the function defined in the statement of Theorem 3. Conversely, if 
(33) ts satisfied with strict inequality, then there is a real 6 satisfying (31) and 
satisfying (32) with strict inequality. 

Now suppose that there is a unimodal distribution function with first and 
second moment-ratios 8; and 8: respectively. Then there is a unimodal distribu- 
tion function H(x) with the numbers 0, 1, ~/8,, 8» as its first four moments. 
Let 6 be the mode of H(x). Then the function F(x) = H(x + 6) is a unimodal 
distribution function with mode zero and moments 


w=1+8, ws = VB — 36 — S, 


us = Bo — 46r/B, + 65° + S&*. 


(34) 

So these numbers y,, --- , ws Satisfy the condition (29) for s = 1 and for s = 2. 
These conditions reduce to the inequalities 

(35) 3-8 >0, 


(36) (3 — 8°)(58, — 9 — 46+/B,) > 168; . 


When 3 — 6 > 0 it follows from the Lemma that the condition (8) is satisfied. 
When 6 = +73 we note that the function G(x) defined in the proof of Theorem 1 
has first and second moments 24, = +273, 342 = 12, and so has standard 
deviation zero. Thus the rth moment of G(z) is (r + l)u = (#+/3)’. This 
implies that 8, = 0, 8 = 9/5, so that the inequality (8) is satisfied in this case. 
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Now suppose that (8) is satisfied. First consider the case when (8) is satisfied 
with strict inequality. Then by the Lemma there is a number 6 satisfying the 
conditions (35) and (36) with strict inequality. Hence the numbers y; , --~ , ms 
defined by (34) satisfy the condition (29) with strict inequality for s = 1 and 
for s = 2. So by the Corollary to Theorem 1 there is a unimodal distribution 
function with moments 4, --- , us satisfying (34). This unimodal distribution 
function has f; and £; for its first and second moment-ratios. 

We have finally to consider the case when (8) is satisfied with equality. Since 
the function 108 q‘(1 — gq)’ (1 + 3q)~ increases from the value 0 and tends to 
+ as q increases from 0 and tends to1, we canchoose a number g with0O < q < 1 
such that 


(37) 8, = 108q‘(1 — q) "(1 + 39). 

Then, since (8) is satisfied with equality, we have 

(38) 582 — 9 = 72q°(3q — 1)(1 — gq) ‘(1 + 39). 

It is not difficult to verify that the distribution function F(x) defined by 
( 0, r< 0, 

(39) Fa) ={¢+(1-qe, 0<2$1, 


| 
t L. s.> A, 


has §; and # for its first and second moment-ratios. This completes the proof 
of the theorem. 
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NOTE ON UNIFORMLY BEST UNBIASED ESTIMATES'! 
By R. C. Davis 


Pasadena, California 


1. Summary and introduction. Bhattacharyya [1] has considered recently the 
following problem in statistical estimation. Let X, , X2, --- , X, be n stochastic 
variables distributed according to the probability law f(a, x2, ---, 23 9) 
dx,dx, --- dx, , where @ is the unknown parameter. Consider the class of all 
functions T(2, , 2, --+ ,2%n) of the stochastic variables such that the expectation 
of each function in this class is equal to a preassigned function 7(@). Usually 7(@) 
admits of more than one unbiased estimate, and the problem posed by various 
authors is to obtain a lower bound of the variances of all such estimates, this 
lower bound to be independent of the estimates themselves but depending on 
7(@) and the distribution function of the n stochastic variables. Under certain 
regularity conditions Bhattacharyya obtained a lower bound of the above type 
which is never lower than the one obtained earlier independently by Cramér [2] 
and Rao [3], although the conditions assumed by Bhattacharyya are more 
restrictive than those assumed by the latter authors. Recently E. W. Barankin 
in a remarkable paper [4] has developed a procedure which yields the class of 
lower bounds of unbiased estimates having minimum sth absolute central 
moment (s > 1) at a preassigned parameter value 6 . In this note we are con- 
cerned with the attainment of a lower bound obtained first by Bhattacharyya. 
Bhattacharyya discusses the case in which his lower bound is attained and 
derives some interesting properties of the distribution of such a statistic (which 
might be called a generalized efficient statistic). 

The purpose of this note is to prove that in the case in which the variables 
X,, X2, --+ , X, are independently and identically distributed with a common 
distribution function F(x; @) depending upon a single unknown parameter, 
one obtains the following result: under the regularity conditions assumed by 
Bhattacharyya in which the parameter @ may assume values in an interval 
of the real axis, and with an additional slight restriction on the cumulative 
distribution function F(x; 6), no generalized efficient statistic exists which is 
constructed by use of both the first and second derivatives of the likelihood 
function with respect to the parameter. It follows that if an efficient estimate 
(in the sense originally defined by Fisher [5]) for the single unknown parameter 


does not exist, then no distribution F(z; 6) exists possessing a uniformly minimum 
variance unbiased estimate of 7(@) which is constructed by using a linear com- 
bination of the first and second partial logarithmic derivatives of the likelihood 
function. This result for the case involving a single unknown parameter is 
peculiarly of interest in view of the fact that Seth [6] has given an example 


1 Presented at the meeting of the Institute of Mathematical Statistics on 18 March 1950 
at the University of North Carolina 
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in which the above construction is possible if the distribution involves two 
unknown parameters. 

2. A theorem. Let X be a chance variable possessing an absolutely continuous 
probability distribution F(z; 6) in which @ is the single unknown parameter. 
Denote by f(z; 6) the probability density function of X, this function existing 
almost everywhere. Consider a finite sequence {X;}, 7 = 1, 2, --- , n, of inde- 
pendently distributed chance variables possessing the common distribution 
function F(z; 6). We restrict ourselves in this note to unbiased estimates of 7(6) 
which are functions 7',(z, , t2, --- , Zn), Where 2; is a random observation of X; . 
Denote also by L(x; , 22, «++ , Zn ; 9) the likelihood function of X, , X.,--- ,X,, 
so that in the case considered in this note 


L(a, » T25°°* » Pa; 0) = f(t > 0) f (xe : 6) * AS f(a» > 0). 


We denote by E, n-dimensional Euclidean space. 
Following Bhattacharyya we make the following assumptions concerning 
F(z; 0): 
AssuMPTION A. X assumes values x in E, and the true parameter value @ lies 
in an interval I C E,. 
AssuMPTION B. F(x; 0) is absolutely continuous in x. 
aL OL aL 


ASSUMPTION C. — , —, — exist almost everywhere in E,, and for every 6e1. 
30” 36?’ ae oT F 
a2 


OL OL , ; : ; 
Assumption D. 40 and Are are linearly independent for almost all points 
Cc ou~ 


Zi, He, °°? , a M,. 
Sis ‘ 
ASSUMPTION E. a6 < Gila, Ze, -*: , 2a), t = 1, 2, for all 0 € I, where 
, Ze, °** , Xn) ts integrable with respect to F over (— ~, ~). 
ASSUMPTION F. dr and er exist for all 6e]. 
dé Oi ss 
iewL tt 

Loe L a 

In this note we make an additional assumption concerning the density function 
F(a; @). 

AssuMPTION H. There exists a closed inteval A (A € E,) such that, for 0 « I and 


ASSUMPTION G. J ;; = el 


| exists for each i,j (¢ = 1,2;7 = 1,2). 


‘ . . of 
xe A, f(x, 0) > 0 and is continuous in x. Moreover, g(t 0) £ 0. 


If we denote by (J"’) the matrix inverse to (J;;), Bhattacharyya deduces 
the following inequality for the variance, V(7’,), of any unbiased estimate 
Tn(a1, £2, +** , Xn) Of 7(8): 
2 » 

(1) Vif.) > 2) 22s 2, 
t—1 7;=1 

where 


sf d'r(0) 
70 de 
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Equation (1) can become an equality if and only if the following equation 
holds for almost all 2,72, -:: ,a@,in E,: 


: LoL 
2 T7.-T= r! —- > 
(2) . 2 "L 36 


where 


0 jyti 
AX; = a ee. 

When equation (2) holds, it is clear that the statistic T,, becomes a generaliza- 
tion of Fisher’s efficient statistic. In view of the desirable property of minimizing 
the variance among the class of unbiased estimates of 7(@), it is clearly worth 
while to attempt to characterize the distributions for which equation (2) is valid. 
We prove the following theorem: 

TuHeEoreM |. In the absence of an efficient statistic for r(6) there exists no cumulative 
distribution function F(x; 0) satisfying Assumptions (A)-(H) which yields for 
any sample size, n, a statistic T,,(a,, %2, °** , tn) satisfying equation (2) for all 
6 ¢ I and almost all (a , 4%, ++: , an) in E,. 

Proor. We prove the theorem by showing that equation (2) leads to an 
impossibility if F(x; @) satisfies Assumptions A—-H. First we transform equation 
(2) into a partial differential equation in In L(x,, 22, +--+ , #,). To simplify the 
presentation we give the proof for the case 7(@) = 0, but we lose no generality 
in doing so. Equation (2) then assumes the form 


(3) t. ~0« Mews | 2 in + (; In L) | 
oA 06 0 J 


c 


in which 


os Jo + 2(n — Ij 
ni juje — Fis + 2(n — 1) hil’ 


jie 
. n| jrur jee “on jiz + 2(n — jn)’ 


pias 10*f 
jx = El sz - ss}. 
f 00 =f ae 


As stated already, equation (3) need not be valid for a set of points 
(a1, 2, -** , 2) having measure z@ro. Denote by A” the closed cube in E£,, 
defined by x; ¢ A, where A is the closed interval defined in Assumption H. It is 
clear from Assumption H that there exist points in A” for which equation (3) is 
valid. We choose one such point (x, , 22, -+- , Zn) and consider this as a fixed 
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point in A” for the subsequent analysis. For each 6 in the interval /, we denote 
by «(@) the value of x given by the transformation 


(4) In f(x; 0) = . In L(a , Ze, °°* » Zn; 9). 
nm 


2 


ie 0 0 ‘ : . . ' 
Since 0 In L and ag In LZ exist almost everywhere in EH, by Assumption C, 


it is clear that qd In f(x(@); 6) and a’ 
do ° : dé 


(v1 ,%2, °** , 2) 80 that the above derivatives exist and also so that equation (3) 
is valid. For each @ ¢ J, the following equation is valid: 


In f(x(0); 0) exist. We choose the fixed point 


l d 2fd , 
5 T, -0=N : x: oi n- 5A +n | — (v; 0)) a 
(5) TT, Ain 76 In f(x; 0) + Az E ae In f(a; 4) n (4 In f(x; 4) | 


. ° ° 0- - ° f° ° 
Substituting the values of \j and A} in (5) and simplifying, we obtain the expres- 
sion 


ad 
* de 
+n | ai d In f(x; 0) + je (5 In f(x; 0) | 
dé ’ "a. 


(6) T,-0= seam scianaitaien 
juje — Jie — 2731 + jin 


‘ 2 1 ; 
(jez — 2jar) = In f(x; 0) + js In f(x; 8) 


To simplify matters we write this simply as 
a(x(8), 9) + nb(x(6), 8) 
c(6) + ne(6) 
Since this equation is valid for every @ ¢ J, we can differentiate both sides of 


the equation with respect to @. We obtain a quadratic polynomial in , which 
we write as 


(7) T, -O0= 


(8) an® + fn +7 = 0, 


in which 
2 d b 
=¢—(-+60 
: dé (° + ), 
22 d a b 
vie E € 4 


2d a 
. oo les ol. 


The two roots of (8) are given by 
_ ~B+V 8 — 4ay 


2a 


nm 
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Since by assumption at least one of these roots is a positive integer, we obtain 
the relations 
B = 2aN,, 
V8? — 4ay = 2aNz, 


where N, and N; are integers. From these two relationships, we can deduce that 
y = (Ni — N3)a. 

Referring to the definition of a and y and performing two quadratures, we 
obtain the equations 


Ju dé “7 


<a 2 
| 2 x | fin f(x; 8) 


juje — Jia — 277, | dé 
+ bs a oe | a In f(a; 0) = T, — 8. 
Juje2 — jiz — 2)11_) dé? 


, es ; 
The solution of the quadratic equation in qa In f(x; 6) in (9) yields 


(9) ld In f(x; 0) + 4 |¢ In f(z; 0 | = T, — 6, 
ad. 


(10) 


(11) ; In f(x; 0) = M(6) + VN(@) + QO@IT. — 4, 
d 


and the integration of the first order differential equation in ae 


In f(x; 6) given 
in (10) yields 


(12) . In f(x; 0) = G(0)T, + H(6) + R(). 


It is clear from inspection of equations (9) through (12) that the solutions to (9) 
and (10) are identical if and only if j,. = 0. Since j,2 is proportional to A? in (3), the 
vanishing of j;; implies that the statistic 7,,(a , x2, --- , 2n) is formed only from 
the first partial derivative of the likelihood function and hence is an efficient 
statistic. This is contrary to the assumption of the theorem. The possibility that 
each side of equations (9) and (10) vanish identically is ruled out by the part of 


0 
Assumption H in which it is stated that 90 In f(a; 6) = 0 for x « A. Hence our 
Cc 


assumption that equation (2) holds leads to a contradiction of the assumptions 
of the theorem. We conclude that there exists no cumulative probability distribu- 
tion function (satisfying the assumptions of the theorem) which yields a gener- 
alized efficient statistic for any sample size n. 
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EXACT TESTS OF SERIAL CORRELATION USING 
NONCIRCULAR STATISTICS 


By G. 8. Watson anp J. DurBIn 
University of Cambridge and London School of Economics 


1. Summary and introduction. For testing the hypothesis that successive 
members of a series of observations are independent J. von Neumann [5] 
(see also B. I. Hart [4]) and R. L. Anderson [1] have proposed test statistics and 
tabulated their significance points. von Neumann’s criterion seems well designed 
to detect deviations from the null hypothesis which might be encountered in 
practice but its exact distribution is unknown. On the other hand Anderson’s 
statistic, while it has a known distribution, is based on a circular conception of 
the population which is rarely plausible in practice. 

In the present note certain noncircular statistics are proposed for which exact 
distributions can be obtained from Anderson’s results. The statistics are derived 
from the usual noncircular statistics by sacrificing a small amount of relevant 
information. Their application is noted to certain regression problems for which 
no satisfactory tests are at present available. Finally, some general remarks are 
made about the choice of best statistics for the problems discussed. 


2. Proposed statistics. Consider the ratio 


, 
x Ax 
(1) , eee 
x'x 
where X = {2,2 --- x,} is a column vector of independent normal variables with 
zero means and constant variance and A is a real symmetric matrix. Then the 
exact distribution of r is at present known only when the characteristic roots of 
A all have the same even multiplicity except at most two of arbitrary multi- 
plicity. Thus in particular the distribution of r is known if A can be written as 


B 0 0) 
(2) 10 M, Ol, 
lo o BI 
where B is a real symmetric / X 1 matrix with distinct roots », > v, > --- > 1, 
I, is the unit matrix of order p, and X satisfies either \ < v, or A < ». Using 


the results of R. L. Anderson [1], we give the distribution of r when ) is not equal 
to vy, or »,. For x’ < », 


, = (y — rite , ) 
(3) P(r >r) = eater wn te ta 
int (4 = )” Il’ on ee Ve+1 > = 


and for 


rf 


(4) ;, viding (mua Se Sw), 


1 0- WIG — 


I 


)! +}p 1 
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IT’ @& — ») = IL — », 


jak 
ipt 


i 
IT’ (yj — 4) = II (vj; — vi). 
j j=l 
jgst 
If A equals »; or », , the distribution of r may be obtained from (3) or (4) by a 
renumbering of the roots. These expressions remain correct for p = 0 or 


aliet 


The probability densities can be derived by differentiation. 
For testing the hypothesis of serial independence in a set of observations with 
a zero (or known) mean we propose the following statistics: 


(5) = Diet o> Ht Leite Tt Lepitmy2 + °° + Ln-1Fn 
ih See 3 alae ie eh 


n 
aE 
i 





(n = 2m) 


YyX—q + -°* + Lui Im 


2 
C = ae + Lm+1 + Ln+2 Im+3 + me + Ta-17, 


n 
de a 
1 


- 1X2 + a" + mlm + Ln+2 Lm+3 + 7 Aa + tn In 
n 
2 2 
z Ti — Lm+i 
1 


It is easily seen that these statistics can be written in the form (1) with A having 
the form (2). Here 


(n = 2m + 1). 


- 0 


‘ — Tl 
which has characteristic roots v; = cos — 
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and A = 1. Thus the distribution of ¢, is given by (3) and (4) with these »,’s, 1 = m 
and p = 0. The distribution of c. is given by (4) with 1 = m, p = 1, andA = 1. 
In c: , Zms1 has merely been omitted altogether so that c. has the same form 
and therefore the same distribution as ¢ . 

An alternative set of statistics is 
+--+ + (Gm — tn)” 
— + (2m41 — Tm42)” +--+ + Gai — 2,)? 


n 

2 

D2 
1 


° 


(x, — 22) 


and 
(a, — a)* +--+: + (2m — Im)” 


a + (Zm+2 ore. Lima) + ah + (tn a Za)" 
dy eux oie oso as ch, r ie (n 


2 
> x; 


qd. — numerator of de 
>» ay - Tot 
1 
As before these may be thrown into the form (1) with 


0 
0 
0 


F —— (¢ = 1,---,m). Hered = 0. 
2m 
Thus the distribution of d; and d; is given by (3) and (4) with these »,’s, 1 = m 
and p = 0, while the distribution of d, is given by (3) with 1 = m — 1, p = 3, 
and \ = 0. 


For the case of an unknown mean we propose the statistics 


which has characteristic roots vy; = 4 sin 


numerator of d; 


(6) dz = n 
Zz (x; ‘a #)° 
1 

and 


numerator of d> 


2 (x; — 2)’ 
1 


a= 
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or dy (n = 2m + 1) which is of the same form as d; (the central observation in a 
series of 2m ,- 1 being omitted). The distribution of d; and dj is given by (3) 
with 


-,m-—1), l=m-—1, p=1, and A 


The distribution of d, is given by (3) and (4) with 


vy, = 4sin’ aa (¢ =1,---,m)l=m and p= 0. 


«- 


The distinction between the above statistics, exemplified by c; , and the related 
circular statistics, exemplified by 


Tite + +++ + Lnatn + Int 
nn 
do x 
1 
is now clear. In each case the numerator quadratic form has been modified from 
the obvious form to take, namely 27 z,27;_, , to a form giving a statistic with a 


known distribution. In c, this is done by throwing out the relevant term 2n2mo1, 
whereas in c; an extraneous term, 2,2; , is included. 


a= 


3. Application to regression problems. Suppose we have the sample correspond- 

ing to the regression equation 

Ye = Bidie +--+ +Betee + et 

and we wish to test the e, for serial correlation. Exact tests are at present avail- 
able only for cases in which the characteristic roots of A in (1) occur with certain 
multiplicitigs mentioned in Section 2, and the regression vectors 2, --- , 2% 
are linear functions of a suitable set of k of the characteristic vectors of A. 
(T. W. Anderson |2].) Such cases are rare in practice. For other cases the general 
problem has been discussed elsewhere (Durbin and Watson [3]). We suggest 
here an approach that will give an exact test when the z vectors are the same in 
different applications, as for example in polynomial regressions and analysis of 
variance models. 

For n = 2m or n = 2m + 1 suppose that separate least squares regression 
analyses are carried out on the first m and the last m observations. We shall 
confine ourselves to cases in which the regression vectors are the same in the 
two analyses. We may, for instance, have fitted a parabolic trend separately to 
each of the two sets of observations. Consider 


, ‘ 
_ 2,Bz, + 22. Bz, 


2121 + 2222 
where 2; and z, are the two sets of residuals from regression, and B is a real 
symmetric matrix with distinct roots. It has been shown (Durbin and Watson 
[3]) that r is distributed as 


m—k m—k 
> unis + i) 7 Dd (nis + 03), 
tex] 


t= 


> 
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where the n’s are independent normal! variables with zero means and unit vari- 
ances and yw, --: , #m-« are the characteristic roots other than k zeros of the 
matrix (J — X(X’X ')X’)B. X here is the m X k matrix of independent variables 
used in the subanalyses. Either of the two forms of B given in Section 2 may be 
used. If the roots are known the distribution of r can be obtained from (3) or (4). 

These results have applications to a number of problems in time series analysis. 
It is proposed, for example, to calculate the characteristic roots, and hence con- 
struct an exact test, for the residuals from a polynomial trend. Other regression 
models that could be treated in a similar way are one- and two-way classifications 
and periodic regressions. 

We might mention that the fitting of separate regressions to the two halves of 
the series will often be less artificial than might at first sight appear since it is in 
any case a common practice to break up time series into two or more parts for the 
fitting of trends. 


4. Powers of the statistics. T. W. Anderson [2] has discussed the Neyman- 
Pearson theory for testing the hypothesis of serial independence of the error 
terms of a regression equation. In testing for serial independence against the 
alternative that the errors follow a stationary (normal) first order autoregressive 
scheme 


€:e = pei +m, 


he has shown that no uniformly most powerful or type B, test exists. From his 
arguments it appears that, of the statistics whose exact distribution is known, the 
statistics c, and d; should be most suitable respectively for series with a fixed 
mean, known and unknown. ° 

As has been noted elsewhere (Durbin and Watson [3]), Anderson’s results give 
us very little guidance for testing in a general regression model. Consequently the 
statistics suggested are justified only by their intuitive reasonableness. On the 
same intuitive grounds it is evident that the device of fitting two separate re- 
gressions is likely to bring about a substantial loss of power if the number of in- 
dependent variables is not small compared with the number of observations. 

The foregoing discussion of power has been conducted in terms of stationary 
Markov alternatives, partly because this is the case that is usually discussed, 
and partly because of its relative simplicity, not because we consider it to be of 
outstanding practical importance. For many cases found in practice the hypoth- 
esis that the errors follow a stationary stochastic process seems to us unrealistic. 
More usually, serial correlation of the errors will be due to systematic behaviour 
arising from the inadequacy of the theoretical model to represent the true re- 
lationship. This is a commonplace in econometrics, where tests of serial cor- 
relation are now often used as a routine procedure in the construction of models. 
In such situations the inappropriateness of a statistic which treats the products 
vx; and 2,2, on the same footing is evident on intuitive grounds. On the same 
grounds the statistics proposed above might prove to be more acceptable in 
many such cases. 
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5. Significance tables of the statistics c, , d;. In Section 4 the statistic c, 
was suggested for a test of randomness in a sample of an even number of ob- 
servations from a population of known mean. A small table of the significance 
points of this statistic is given below. If the observed value of c; is greater than the 
tabulated value, the null hypothesis of randomness will be rejected at the 
5% level of significance in favour of the hypothesis that positive serial correla- 
tion is present. As the distribution of c, is symmetrical about zero, a test for 
negative serial correlation may be made by considering —c, . If the sample size 
is odd, the central observation could be dropped and the tests made as above 
with ¢. 


5% point of c, for various n 


10 12 14 16 18 20 


0 .426 0.403 0.382 0 .364 0.348 0.334 0.321 


For a test of serial independence in a series of unknown mean, the statistic 
d; has been suggested when the sample size is even. If the observed value of d; 
is less than the value tabulated below, the null hypothesis is rejected at the 
5% level in a test against the alternative of positive serial correlation. For samples 
of odd size, the middle observations may be omitted so that d; is still applicable. 

5% point of d; for various n 


12 14 16 18 20 22 24 26 28 30 


0 .967 1.04 | 1.11 | 1.16 | 1.20 | 1.24 | 1.27 | 1.30 | 1.33 1.35 
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DISTRIBUTION OF THE ORDINAL NUMBER OF SIMULTANEOUS 
EVENTS WHICH LAST DURING A FINITE TIME' 


By HERMANN VON SCHELLING 
Naval Medical Research Laboratory, New London, Connecticut 


1. Introduction. The probability of drawing a white ball from an urn is p, and 
the complementary probability of getting a black ball is (1 — p) = gq. One 
ball is drawn and returned during one time unit. When a white ball appears, 
the play is interrupted for k time units. Then it starts anew. 

If it happens that at the nth time unit a white ball occurs, we ask for the 
probability w(m; n, k, p) that it is the mth ball since the first beginning of the 
trials. We are interested in the mean E(m) and in the variance Var(m), and in 
simple approximations for E(m/n) and Var(m/n) when n is large. 

2. The probability distribution. Let us start with the relative probabilities. 
If the first white ball appeared at the nth moment, (n — 1) black balls preceded, 
which means that the relative probability is - If it was the second white ball, 
the number of black balls was reduced by (k + 1), k for one interruption of the 
play lasting k time units and 1 for the first white ball, which occurred with the 
probability p. The group of [(n — 1) — (k + 1)] black balls may be broken 
into any two parts, including the case of one being empty. That makes (‘"")?~*) 
possibilities. Therefore the relative probability for m = 2 is 


r-1 ((n — 1)—k(m — 1) p ifs 
(1) q ( a % (ea) ’ 


with m = 2. It is easy to verify step by step that the general formula is correct for 


n-l ee . 
m=1,2,---,1+ k , | Hence the preliminary answer to our problem is 
[+ 


1 ,-1f/(n — 1)—k(m — YY Dp m-1 
, . k = atime 
w(m;n, k, p) we q ( m | qh ’ 


k = 0, 1,2, --: 


? 


== SS - 6¥s n a 
m ys ’ 1 + E + +, 


where [a] means the largest positive integer Sa. The constant C has to be 
determined by 


i+[((m—1)/(k+1)] 
(3) w(m; n, k, p) = 1. 


m=) 


* Opinions or conclusions contained in this paper are those of the author. They are not 
to be construed as necessarily reflecting the views or endorsement of the Navy Department. 
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This is a strictly algebraic problem. According to (3) 


1+[(n—1)/(k+1)] m-1 
pe. oe an p.) 2 dias ( p 
6 Te ie ( m— 1 ga) = *\on 


For abbreviation let us write 

(5) x= p/q 

It can be proved easily that 

(6) Sn41(Z) — Sn(x) — 2Sp_-x(x) = O. 


This is a linear recursion formula which can be solved in principle. The particular 
solution in question must satisfy the conditions 


(7) 8:(z) = 8(z) = o> = Hii(z) = 1. 


The characteristic equation belonging to this linear recursion formula with 
constant coefficients reads 


(8) 


or 


(9) f(z) = gt qz —p=0 


using the abbreviation z = gy. Then 
(10) f(z) = Ak + Deze - kq). 


It follows that f’(z) = 0 for z = 0 (multiplicity k — 1) and for z = kg/(k + 1) 
(multiplicity 1). We find f(0) = —1+ q # Oif g ¥ 1. The exception is irrelevant. 
On the other hand 


f(kq/(k + 1)) = —k(q/(k + 1))*™ — p. 


For 0 < g < 1 this expression is negative. Hence f’(z) = 0 and f(z) = 0 do not 
have common roots. That means that the roots of the characteristic equation (9) 
are different from each other. 
Therefore a regular solution of the linear difference equation (6) exists; it is 
k+1 


slide ace a 
(11) 8n(2) ee o? 


where 71, Y2, °°: , Yeu are the (kK + 1) different roots of the characteristic 
equation (8). Summarizing, we find according to (2), (4), (11) 


om cai — ((a — 1) —k(m — vy p rs 7" 
ras Sen eee a ( m—1 qe x &+ Dy — ke 


3. The mean and variance. We may write 


(13) w(m; n, k, p) _ (“ — 1) — k(m — f a | s(0), 


m— 1 





. 
' 
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where x = p/q'' and s,(x) is given by (4). It is obvious that 


qd 8, (x) 


dx 


6 = K(m — 1), 


= ~ §,(z) 
axr~ 


S,(x) 


= El(m — 1)(m — 2)). 


d 
pa ead 
— a 


E(m) = 1+ sala) 


, 


» a 


pe 8, (2) 
dx? ut 
8n(x) 
The finite series s,(a2) can be differentiated term by term. We use the operator 
- d d dy; rae i d 
(17) =—— = ‘ 
dx dy; dx “TM(k } ly: - = - kj dy, 


The evaluation of (15) and (16) is.only a technical matter. Let me list the final 
results: 


(16) E(m*) = —2 + 3E(m) + 


n—k+1 


+1 +1 
E(m) = 1 + sx —P™ (e+ D@ —D 2 rer L)3 
> Y vat [(K + l)ys — Af 
(18) i (k 


k+l n—k 


— kn > ” _}; 


i=l [(k + 1)y¥: =~ k]) 


E(m’*) = 


+ k+1 


—2 + 3E(m) 
(p g**)? k+1 


y: G + 1)’ (n a L)(n — 2)>> 


i=1 (k 


| (k > l)y; —k 


— k(k + 1)[2n® — 2k +3)n+ (k- D> 


k+1 oe 


i=1 (k - Dy. - k}5 


k+1 yi 
tL kn(n — k) > \. 
i=l [(k 7 are k}°) 

Now we find as usual Var(m) = E(m*) — [E(m)|’. These formulas are exact, 
but highly theoretical, since the roots y; are unknown if k > 4, except 71 = 1/q. 
Also, if we knew the roots, the equations would be too complicated. There is an 
urgent need for convenient approximations. 
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Let us write the characteristic equation (9) as follows: 
(20) p(1/z)** + g(1/z) = 1. 
1/z = +1. Therefore, we have found that z; = 1 (according to (10) y; = 1/q) 
is the absolutely largest root of the characteristic equation. Powers of the form 
(21) (vs/11)", i = 2,3,---, (K+ 0), 


will converge rapidly to zero for large A. This fact produces the following ap- 
proximations, true for large n and for k = 0, 1, 2, ---: 


This relation is impossible for | 1/z | < 1 and is satisfied for | 1/z | = 1 only if 


y (m; n, k 1 tf (n — 1) — kim — 7 p \""* 
(22) w(m; n, k, p) (1+ kp)q ( m-— 1 qe , 
: < < : e +} 
withl Sms1+ k: i | 


, ce es ot be); 
(23) E(m/n) ~ 5 + (1 (1 + kp)?/’ 


ican saat tir) 
Var(m/n) n+ kp)? ( nm 1l+kp/’ 


Equation (22) is presented here only with a warning that the accuracy may 
not be good enough for some purposes. But the approximations (23) and (24) 
are fair. 

For large n 


Var(m/n) 1 @ l—p 


[E(m/n)> n pr’ 


(25) 


independently of k. This relation is essentially a consequence of dimensional 
considerations. In applications the first member of (25) may be known from 


observations. Then the second member appraises p. From E(m/n) ~ 


the parameter k could be estimated. 

The equations (22) through (25) answer all our questions in a convenient 
manner. The new class of distributions is related to the Bernoullian type. It is 
to be expected that it will become useful in many fields of applied probability 
theory. 
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MINIMUM GENERALIZED VARIANCE FOR A SET OF 
LINEAR FUNCTIONS' 


By Rosert G. D. STEEL 


University of Wisconsin 


1. Summary. Let n variates possessing finite first and second moments be 
partitioned into k sets. A system of equations is developed for which some 
solution consists of k sets of coefficients which combine the k sets of variates into 
k variates possessing minimum generalized variance. 


2. Introduction. Let x, , --- ,2, be observed variates having zero means and 
finite covariances, o;; , 7,7 = 1, --- , n. Let the variates be such that 


(1) | o5 | ~ 0, 


where | o;;| is the determinant of the covariance matrix, 2 = (¢;;). Partition 
the n variates into k vector variates 


9 oie » _— > 
(2) Xj) — (Lnz+t > °° * > Lmztn;)s z= be ss 


—1 ‘ e,° » 
where m; = 2 ni,j# 1,m = 0, and ny < ni4,. Partition = correspond- 


ingly as 


\ 21k on °° * Lee 
where 3; is an n; by n; matrix with transpose 3;; . 
For k = 2, Hotelling [1] forms two variates, one linear function of the variates 
for each vector variate, for which the correlation is maximum. This leads to 
canonical variates and canonical correlations. Wilks’ \,; criterion [2] is the likeli- 
hood ratio criterion for testing independence among k sets of normally dis- 
tributed variates. 
We consider the choice of / linear functions, one per vector variate (2), such 
that the & resulting variates possess minimum generalized variance when each 
has unit variance. 


! This paper is based on doctoral dissertation 957 on file at the lowa State College Library 
and submitted to the Graduate Faculty of Iowa State College in partial fulfillment of the 
requirements for the degree Ph.D. in Mathematical Statistics. A part of the research was 
conducted under the sponsorship of the Office of Naval Research. The principal result in 
the paper was presented to the American Mathematical Society in Lawrence, Kansas, 
April 30, 1949 
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3. Some theorems and the procedure. 

THEOREM 1. A set of linear functions with unit variances exists possessing 
minimum generalized variance. 

Proor. From the vector variates (2), form variates 


: , 
(4) AnH 


(‘ denotes transpose), where a. = (Gm,41, °** » @mj+n,) is @ 1 by n; vector of 
real numbers such that 


(5) ap 2jjay = 1, j=l1,---,k. 


The variates (4) have zero means and unit variances. Denote the determinant 
of the covariance matrix by C. C is a bounded continuous function of n a;,’s 
and is defined jointly over k closed connected sets, the positive definite quadratic 
forms (5). Therefore, there exists a minimum for C. 

Apply real nonsingular transformations to the vector variates (2): 


(6) ayTi = bn, o> yk, 
for T; an n; by nj; matrix such that 
(7) T;24jT; = 1, 


I; is an n; by n; identity matrix. 
The n by n matrix 


(Ts 0 0---0 0 


(8) 


(0 0 0---0 T% 


will be called an internal linear transformation when applied to the k ordered 
vector variates simultaneously. 7 is real and nonsingular. 
The covariance matrix of (& , --- , &,) is 


T:22T: +++ T:2uTx) 
TX Ti 


Ti2ieT: Tre Tr -- 
Denote it by I and 7,2;;T; by Ti;. 
Lemma 1. Transforming matrices T ; subject to (7) differ only by multiplication 
by orthogonal matrices. 
Hence matrices such as (8) differ only by multiplication by orthogonal matrices 
consisting of blocks of orthogonal matrices in the diagonals and zeros elsewhere. 
Restrict further internal linear transformations to orthogonal matrices. 
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From among such transformations, it is required to choose one such that the 
k resulting variates shall possess minimum generalized variance. Clearly, this 
restriction on the transformations is no restriction on the value of the minimum 
generalized variance. Let each of the & desired variates be the first in its re- 
spective vector variate. Movement of any variate within a vector variate can 
be accomplished by means of a permutation matrix, included among the permis- 
sible orthogonal matrices. 

Consider the effect of an orthogonal internal linear transformation on the 
covariance matrix of the k variates that are the first ones in the k vector variates 
(6) subject to (7). To do this, examine the effect of the transforming matrix on 
the appropriate k by k principal minor of T’. The effect may be observed by 
using compound matrices, in which any principal minor of the matrix com- 
pounded occurs as a diagonal element. Denote the kth compound of a square 
matrix A with elements a;; by A“ with elements a*’. 

Let the orthogonal internal linear transformation be P with 


ne / 
(9) Ea) Pa = Ma); 


for P, and n, by nq orthogonal matrix. Denote the elements of Pa by api; , 
J = 1--+, ma. The covariance matrix of (m, ---, 2m) is PIP’. Denote 


this by N. The kth compound of N is 


l 


(10) NP a PRP" 2 Per. 


The principal minor of ! which we wish to observe under the transformation 
(9) is that with diagonal elements the unities in the upper left corners of the 
rm: ° ‘ © alk a » ° ° 
[;. This minor appears as a diagonal element of T’, its transform being in the 

Gas . r(k) 
same position in NV’. 
The transformed principal minor from (10) is given by 
n 
- 
y(k) , (k) 
ay 4 yea, eee, ben)’ = ts Yas ls 
(11) F (x) a ae sg=1 


— 
= Vo» 


where the subscript g is appropriately chosen and the elements ¢; are the k 
by k minors, ordered lexicographically, of the matrix consisting of rows 
m, + 1, ---,m, + 1 of P. Clearly each nonzero ¢; consists of a single product 
of one element from the first row of each P, . 

The problem is to determine those op,’s,i = 1, --- ,m_ anda = 1, --- ,k, 
which minimize y}*?. Hence let us obtain partial derivatives of the elements 
of the P,.’s with respect to the parameters of these orthogonal matrices. 


4. The derivative of an orthogonal matrix. If a nonsingular square matrix A 
is a function of x2, then by differentiating AA” it follows that 


—1 -1 
dA _ ai dA” A, dA” _ A 


dx dx dx dx 


dA | 


(12) -, 
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Lemma 2. For a real nonexceptional orthogonal matrix Q, 


(13) Aqaa _ _1 | Gai + Gai) (aj + Gai) 


08;; 2 | (is + gis) (8s + ga) 


where qas is the a, Bth element of Q and s;; is the i, jth element of the skew-symmetric 
matrix S in Cayley’s’ parametrization of Q, viz.,Q = (I — S8)/(I + 8). 
Proor. Let the parameters be s;;, i < j. By (12) and since (J + S)* = 
37 + Q), 
aQ aS 


i air Gs (7 + Q). 


O8i; 8 


Now 0S/ds,; is a matrix consisting of a + 1 in the 7, jth position, a — 1 in the 
j, tth position and zeros elsewhere. Hence, elementwise, we have (13) where 6 
is Kronecker’s delta. The partial derivatives are expressed in terms of the 
elements of Q. 

An exceptional orthogonal matrix becomes nonexceptional when multiplied 
by an appropriate J matrix, a square matrix with +1 or —1 in each diagonal 
position and zeros elsewhere. When JP replaces P, principal minors of PIP’ 
will not alter in value though some variates will change sign. Hence, let us 
consider only real nonexceptional orthogonal transformations. 

To maximize v\*? as in (11), equate to zero: 


(i) 


b=1 dls O aSij 


(14) ovee dts 
8 «Si; 9 a8ij’ 


9) 
av dts “ (k) Ota 
=—" = 2 > tsv - 
B,d—1 


where ,8;; is a parameter of P,, a = 1, --- , k. The partial derivatives of the 
ts’s are found by use of (13). Equations (14) with those imposing orthogonality 
restrictions on the P, are =, ~.n2 simultaneous equations in as many unknowns, 
the elements of the P,’s. 


5. Two-set case. For k = 2, variates having minimum generalized variance 
are seen to have maximum correlation. Since Hotelling’s canonical fcrm is 
unique except for the order and signs of the elements in the diagonal of the off- 
diagonal block and the maximum correlation is always present there, it can be 


shown that a solution obtained by the present method will agree with that from 
Hotelling’s method. 


6. A three-set case. For k = 3 and nm = m = n; = 2, let I’ be such that each 
of (Ti, Tis), (Ti2, 23), and (Tis A Tse) has unit rank. Chu [4] has shown that 
the rows of (J; , Tis , Ts), (Tis , Ie , To3), and (Tis ’ les , I.) can all be internally 


? For a recent work, see Weyl [3]. 
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orthogonalized by an orthogonal internal linear transformation. ‘The resulting 
covariance matrix may be written 


(] 0 P13 0 
oe £ 9 


0 
og 8 § 


where X, Y, and Z are 2 by 6 matrices. 

Now XX’, YY’, and ZZ’ have distinct characteristic roots. Hence (15) is unique 
except for internal permutations of rows and columns and changes in the signs 
of pis, pis, and p;;. Also, the orthogonal transforming matrix is unique except 
that the signs of the elements in any row may be altered simultaneously. 

It is now easy to find variates, one per set, with minimum generalized variance. 
It can be shown that the only permissible orthogonal internal linear transforma- 
tion consists of an identity matrix. Hence the minimum generalized variance is 


| ] Pis Pb 


It is unique, as are also the internal linear transformation and the resulting 
variates. 
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CLASSES OF DECISION FUNCTIONS 


ON «-COMPLETE CLASSES OF DECISION FUNCTIONS’ 


By J. WotFow11z 
Columbia University 


An example of an “almost subminimax” solution’ is given in a paper by 
Hodges and Lehmann ({(1], Section 5, Problem 1). Another paper, written 
independently by Robbins [2], has put much stress on the idea and given a 
considerable discussion and many examples. Frank and Kiefer [3] have given a 
prescription for constructing almost subminimax solutions. 

Let Q = {F} bea set of distribution functions F. The statistician has to make 
one of a set of decisions in a space D‘ = {d‘}. He takes observations in stages 
(finite subsets) on an infinite sequence of chance variables X = X,, X2,---, 
distributed according to an unknown one of the F’s. The statistician employs a 
decision function 6, a rule (which may involve randomization) which tells him 
when to stop taking observations and what decision to make when he has 
stopped taking observations. The risk r(F, 5) of a decision function (d.f.) 6 
when F is the distribution function of X is the sum of the expected values of the 
cost function and the loss function. All these ideas are described rigorously 
and in detail in the book [4] by Wald, whose notation we adopt. Some familiarity 
with this book and its ideas will be assumed. The brief résumé given in this 
paragraph was partly for the purpose of recalling some of the important notation. 

An almost subminimax d.f. 6* may be roughly described as follows. Let 5** be, 
say, an admissible minimax d.f. For all F’s in Q we have r(F, 6*) < r(F, 5**) + « 
with ¢ “small” and positive, while for “most” F’s in Q we have r(F, 6**) — r(F, 6*) 
equal to a “large’’ positive number. 

An important task of the mathematical statistician is to exhibit a complete 
class of d.f.’s for a problem under consideration; an essentially complete class 
is even more useful®. The choice of d.f. from among the members of an essentially 
complete class requires additional principles. A possible principle is to choose a 
minimax d.f. (There may be more than one minimax d.f., and even more than 
one admissible minimax d.f.) This might be the course of a very conservative 
statistician whose ignorance of F is complete. The appeal of an almost sub- 
minimax d.f. in preference to a minimax d.f. occurs when the statistician considers 
e small, or the F’s for which r(F, 6*) > r(F, 6**) rather “unlikely,” or both. 
There will usually be little difficulty in deciding when « is small, but perhaps 
considerable difficulty in deciding that some “few” F’s are “unlikely” and 
just what is to be done about them. 

Let « > 0 be a fixed number. A df. 6; will be called «-equivalent to the df. 
db: if | r(F, 6:) — r(F, 62) | S « for all F in Q. (The relation of e-equivalence is 
obviously not transitive.) A d.f. 6, will be said to be e-better than the d.f. 3, if 5, 
and 6, are not ¢-equivalent and if r(F, 6:) S r(F, 62) + ¢ for every F in®. A 

1 Research under a contract with the Office of Naval Research. 

? This represents a slight change in nomenclature from that of Robbins’ paper [2]. 

* Provided, of course, that it is not complete. 





Nr a AP RON A RE eI 


462 J. WOLFOWITZ 


class C' of d.f.’s will be called essentially e-complete if, for any d.f. 6; not in C, 
there exists a d.f. 6. which is a member of C and is such that either 6: is e-equiva- 
lent to 6; or és is e-better than 6, . 

If an almost subminimax df. 6* exists, then 6* is e-better than the minimax 
decision function 6**. If the statistician regards differences of at most ¢ in the 
risk function as negligible, and differences larger than ¢ as meaningful, he will 
prefer 6* to 6** and this preference will not depend upon which F’s are “few” or 
“unlikely.”” We shall show that under certain conditions there exists an essentially 
e-complete class Cy which contains only a finite number of elements, and in the 
next paragraph we will formulate the conditions precisely. 

We shall assume that Assumptions 3.3 to 3.7 of Wald [4] hold, and in addition 
that the available d.f.’s are all such that 


(1) r(F,5) << © forevery FinQ; 


«© 
(2) lim >> P;(F, 6) Sup e(z; s’) = 0 
k—-+-0 jak x83 

uniformly in 6 and the F in & Here P;(F, 6) is the probability according to 
F that, when 6 is employed, a decision will be reached in exactly 7 stages of 
observations, and s’ is any j-stage set of observational indices such that the 
cost function c(x; s’) is not + © identically in x. It follows from Assumptions 3.5 
(II) and 3.6 (III) of [4] that Supz,,i c(z; s) is always finite. Equations (1) and 
(2) are essentially Condition 7 on page 298 of [5]. Note that the statistician may 
use d.f.’s which require no observations at all. 

Under these conditions we shall prove that there exists an essentially e- 
complete class Cy) whose elements are finite in number. 

Define 


(3) ar(F,, F2) = Sup | P{R| Fi} — P{R| Fe} |, 
R 
where RF is any Borel set of the r-dimensional Euclidean space, and 
(4) p(Fi, F2) = Sup | W(F,, d') — W(Fs, da‘) |. 
dtept 


Finally, let 


(5) e(F: Fs) = p(Fi, F2) + 2 2 pir(Fi, Fe). 


It follows easily from Assumption 3.7 that Q is compact in the sense of the 
metric p(F, , Fe). 
Let « > 0 be given arbitrarily. There exists an m = m(e) such that 


(6) Sup | r(F, 6") — r(F, 4) | < Ea 
F.3 64 
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where 6” is the d.f. 6 truncated (in any manner whatever) after m stages of 
observations. In m stages of observations one can have observations only on the 
chance variables X,, --- , Xm+, where m* is a finite-valued function of m. 
Just as in the proof of Theorem 3.3 of [4], one proves that r(F, 5”) is continuous 
with respect to the metric p(F; , F:) uniformly with respect to all available 5". We 
conclude from (6) that there exist a finite number q of F’s, say ee Ft : 
with the property that for every F in @ there exists some element F; such that 


(7) Sup | r(F, 8) — r(F%, 8) | < xe 
} 
for all available 6. 

We now define 


(8) n(di , ds) = Sup | W(F, dt) — W(F, dd) |. 
FeQ 


There exist a finite number of elements d‘, say g:, --- , g:, such that for every 
d‘ in D* there exists an element g; such that n(d', gi) < «/64. We shall now 
temporarily limit the statistician to decisions in D* = {g,, --- , gi}. This means 
that a d.f. 6 is to be replaced by 6*, where 6* is derived from 6 in the following 
manner: 6 and 6* are the same except that, whenever 6 tells the statistician to 
make a decision d‘ in Dt, 6* tells the statistician to make a decision in D* nearest 
to d‘ in the sense of (8). Obviously, 


(9) | r(F, 8) — r(F, 8*) | < a 


for every available 6 and every F in Q. 

Consider now the following statistical problem: 2* = (Fr, ---, Fe} and 
D* = {g,, -*- , g:} are the spaces of distributions and decisions, respectively. 
The statistician is allowed at most m* observations in various possible steps. 
The cost function and other assumptions are as before. It follows that the set S 
of points in Euclidean g-dimensional space {r(F:, 6°"), -->, r(F,, 8 ”)} for 
every available 5*”, is bounded and convex—bounded because the loss and cost 
functions are bounded and convex because the totality of available d.f.’s is 
convex. Obviously a finite number of decision functions corresponding to a 
finite number of points of S which lie close to the periphery of S constitute an 
essentially ¢/4-complete class for the 2*, D* problem with respect to all available 
d.f.’s which permit at most m stages of observations, and with respect to all 
d.f.’s which can be obtained from an available d.f. by truncation at the mth 
stage. Call the resulting class C4 . 

It remains to prove that Co is essentially «-complete for the problem Q, D*, 
and with respect to all available decision functions. Let do be any d.f. not in C5 , 
and consider the d.f. 69”. There are now two possibilities: (a) 69” is «/4-equiva- 
lent to some member h of Cy with respect to the problem 0*, D*; (b) do” is not 
e/4-equivalent to any member of Cy with respect to the problem 2*, D*. 
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Suppose (a) holds. Let F be any element of 2. By use of (7) we obtain that 


(10) r(F, 0") — r(F,h) | < = 


From (9) we have 
23¢€ 


(11) | r(F, 50) — r(F,h) | < 4° 


Finally from (6) we have that 


(12) r(F, &) — r(F,h) | < 2 


' ? 


so that 49 and h are surely e-equivalent. 
Suppose now that (b) holds. Let ¢ be ¢/4-better than 69” with respect to the 
problem Q*, D*. For any F such that there exists a nearest F; for which 


(13) | r(Fz, 80") — r(F%, 2) | <3 


we have, just as in the previous case, 


(14) r(F, &) — r(F,t) | < =. 


If for some F (13) does not hold we must have, for a nearest Fj , since t is €/4- 
better than 6)” with respect to the problem 2*, D*, 


(15) r(Fi, 60") > r(Fi, O +3 
Now 

| r(F, 6) — r(Fi, 0”) | < ae 
de 


(F om - 
| r(F, t) r(Ft Q|\< = 64° 


Hence 
(16) r(F, 50) > r(F, 0) + 5 


Since either (14) or (16) must hold we conclude that ¢ is either e-equivalent to 
do or ¢ is e-better than 6). This proves that Cy is an essentially e-complete class 
for the original problem. 


REFERENCES 


{1] J. L. Hopees ano li. L. Lenmann, “Some problems in minimax point estimation,” 
Annals of Math. Stat., Vol. 21 (1950), pp. 182-197. 





ALMOST SUBMINIMAX PROCEDURES 465 


[2] Hersert Rossins, “Asymptotically subminimax solutions of compound statistical 
decision problems,’’ Proceedings of the Second Berkeley Symposium on Mathe- 
matical Statistics and Probability, University of California Press, 1951. 

3] P. Frank AND J. Krerer, ‘‘Almost subminimax and biased minimax procedures,’’ Annals 
of Math. Stat., Vol. 22 (1951), pp. 465-468. 

[4] A. Waxp, Statistical Decision Functions, John Wiley and Sons, 1950. 

5] A. Waxp, ‘Foundations of a general theory of sequential decision functions,’’ Econo- 
metrica, Vol. 15 (1947), pp. 279-313. 


On 


ALMOST SUBMINIMAX AND BIASED MINIMAX PROCEDURES! 
By P. FRANK AND J. KIEFER 
Columbia University 


Robbins [1] emphasized the notion of an “almost subminimax”’ procedure’ and 
gave an example of such a procedure. The examples in this paper have been con- 
structed with a view to simplicity and to the indication of ‘he underlying mecha- 
nism which makes subminimax solutions exist in certain decision problems. At the 
same ime we point out another potentially undesirable property of a minimax 
procedure—biasedness. 

All our examples fall within the following framework. A sample of one is 
taken from a population whose distribution is one of n given distributions: 
F(x), Fo(x), --- , Fn(x). There aren decisions: d, , --- , d, . The weight function 
is W(F;,d;) = Oif ¢ = 7 and = 1 otherwise. Instead of a finite number of F’s, we 
may have a sequence of F’s with a corresponding sequence of decisions. In all 
our examples each of the F’s will be a uniform distribution over a finite interval 
of the z-axis, and our decision procedures will be randomized. These restrictions 
are made only for arithmetical simplicity. 

With this setup, the risk when F; is the true distribution is equal to the proba- 
bility of not making decision d; , which we will denote r(F;). We will not give an 
exact definition of an almost subminimax procedure, but just say that a procedure 
is almost subminimax if its maximum risk is “a little greater” than that of the 
minimax procedure (which risk is the same for all minimax procedures in our 
examples) and on the other hand its risk is “‘a lot less” than that of the minimax 
for ‘most of ’ the F’s. Our examples will conform with this “definition” for almost 
any reasonable interpretation of the phrases in the quotes. 

The first example will give an indication of the mechanism which makes a 
subminimax example possible. Let Fi(x) be the uniform distribution on the 
interval 1 — ato 1, where a > O and small. Let F2(x) be the uniform distribution 
on the interval 0 to 1. An admissible minimax procedure to decide between d, 

1 Research done under a contract with the Office of Naval Research. 

? The examples of this paper fall into the framework of the definition in [1] of an ‘“‘asymp- 
totically subminimax solution” if each example is replaced by a sequence of examples whose 
a’g approach zero. The present nomenclature was suggested as more suitable here 
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and d; is to accept d. for0 < x < 1 — a and for 1 — a < x < 1 to accept d, 
‘ ie l 
with probability p, and accept dz with probability 1 — p; , where p,; = ee 
a 
eS 
l+a’ 

Let us compare this procedure with thé procedure which tells us to accept 
d,for0 < x < 1 — aand to accept d; for 1 — a < x < 1. For this procedure we 
have r(F;) = 0, and r(F:) = a. Thus we see that we can reduce the risk under 
F, to its absolute minimum 0, while increasing the risk under F; only slightly. 


With this procedure r(F,) = r(F:) = 


; : a 
(In fact, the ratio of the two risks under F, , namely, a and i+ , approaches | as 
L gq 


a 
l+a 
was given mainly to help in understanding the underlying mechanism in the 
almost subminimax example which follows. In the latter the maximum risk will 
be > 3 for all a. 

Let F3(x) be a uniform distribution from —a to 1 — a. In deciding between 
F,(x) and F;(x), an admissible minimax procedure is to accept d, for 1 — a < 
x < 1, to accept d; for —a < x < 0, and to accept d. and d; with probability 
4 each for 0 < x < 1 — a. The minimax risk is } (1 — a). For a small, this is 
near 3 and the two distributions are so intermeshed that there is little hope to 
disentangle them. When we now consider the problem of deciding between 
F,, F2, and F; , we expect that the addition of F; can not do much to aggravate 
the difficulty already present in trying to decide between F. and F; . 

The following is an admissible minimax procedure for deciding between F; , 
F,, and F; : 


a — 0.) Thisexample, whichmay seem meaningless because —Oasa—Q0, 


for —a < 
x 


< 0, we accept d; ; 
for0 < 1 


o 
< a, we accept d, with probability p. and d; 

with probability 1 — pe ; 
for 1 a <x < 1, we accept d,; with probability p, and d, 


with probability 1 — p, ; 


l+a 

2+a 

1 

2+ a 
Consider the alternative procedure which is exactly the same as the minimax 

procedure except that for 1 — a < x < 1 we always accept d, . For this pro- 

cedure, 


where p; = and ps = . For this procedure, r(f;) =r(F2) = 


| 
(2+ a)(1 — a) 


r(F;) = 


r(F:) =0; = r(F2) = ee r(F,) = oe 


j 
2+a 2+a’ 2+a 
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Thus the alternative procedure reduces the risk from 5 = 2 0 under F,, 
2+a\a+2 
under F;. The alternative looks more attractive. 
This last example can be altered slightly so as to appear more striking. We 
can replace the distribution F(x) by a sequence of distributions F'(x), F’(x), --+, 
F"(r), --- , where F"(zx) is the uniform distribution on the interval 


: ‘ a a a ‘ 
increases it under F; by 5 ( —~Oasa— 0) , and leaves it unaltered 


a a 
1 — —<r<(- 
( e) + 41 ~7 56 a)+— 
Call this interval J, , (n = 1, 2, --- ). Corresponding to the distributions F,(z), 
F;(x), F(x) , --- , F"(x), --~ , there are decisions d,, d;, d', --- ,d",-->. 
An admissible minimax procedure is described as follows: 
for —a < x < 0, accept d; ; 
for 0 < x2 < 1 — a, accept d, with probability p. and d; 
with probability 1 — ps ; 

for ze I, , accept d" with probability p; and d, with probability 1 — p, ; 

] 


(2+ a)(1 — a)” 
r( Fe) =r (Fs) = oy = r(F’) for j= aL ae eee 


Consider the following alternative procedure: 


‘oa 1 + 
where py; a 


For this procedure, 


a 
and po. = 
a 


for —a <2 < 0, accept d; ; 
for 0 < a < 1 — a, accept d. with probability p. and d; 
with probability 1 — pe; 
for ze J, , accept d’; 
1 
(2+ a)(1 — a)” 
] a = 1 
: (F;) = ——— ; 
2+a 2+ a’ r(Fs) 2+a 
If a is sufficiently small, the alternative procedure is certainly almost sub- 
minimax in the sense of our third paragraph: the maximum of the risk of the 


where p:. = For this procedure, 


r(F2) = r(F’) = 0, 


. : a ; ae 
alternative procedure is only > + q Beater than that of the minimax procedure, 
and the alternative procedure has reduced the risk to zero for all except two of 
the distributions. 

A decision procedure for deciding which of a class of distribution functions 
is the true distribution of X is said to be unbiased for F; if Prob(d;| F;) > 
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i 
Prdb(d; | F;) for all j. If a procedure is not unbiased for F, , it will be said to be 
biased for F;. In the next example every minimax procedure is biased for F; . 
Let F,(x) be the uniform distribution on the interval 0 to 1 — a, with a < }. 
The problem is to decide between F; , F: , F;, and F,;. An admissible minimax 
procedure is described as follows: 


for —a < x < 0, accept d; ; 
for0 < x < 1 — a, accept d, with probability pe , 
accept d; with probability ps 
accept d, with probability 1 — ps — ps ; 
, accept d, with probability p: , 
accept d. with probability 1 — gr; 
l-at+a 1 — 2a 


where — pe. = me P+ a a)” 


») 
‘ ; 2-—a l+a 
Thus, we have Prob(d2|F;) = 5 > = 

This shows that the procedure is biased for F; . By altering the procedure so 
that p2 = ps; = 4 and p,; = 1, we obtain a procedure which is unbiased for all 
F; , and whose maximum risk is increased by only 3a over the minimax risk of 
2-—a 


= Prob (di|F)). 


3 
The above example may be altered in the same way as the example of an 


almost subminimax solution so that there are infinitely many distributions for all 
but three of which the minimax solution is biased. In fact, it is possible to con- 
struct an example of a biased minimax solution for deciding among any number 
of distributions greater than two. It is impossible for a minimax procedure to be 
biased when there are only two distributions. 

Along similar lines an example can be constructed for any « > 0 for a con- 
tinuum of distributions, where any minimax procedure has constant risk of 
% — e and is biased for all but three distributions, and where there exists an 
alternative almost subminimax procedure which is unbiased for all distributions 
and which reduces the risk to zero for all but three distributions where it is in- 
creased by less than 3e. 
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ON THE DISTRIBUTION OF AN ANALOGUE OF STUDENT’S t¢ 
By K. C. S. Pmuar 


University of Travancore, Trivandrum 


1. Introduction. The independence of the sample range and the mean in 
random samples from a normal population has been already established [1], [2]. 
Using this property of independence, J. F. Daly [1] has shown that with the 
help of the distribution law of the sample range, w, tabulated by Pearson and 
Hartley [3] the probability distribution of an analogue of Student’s ¢-test given 
by G = (# — a)/wcan be studied, where Z is the mean and a the location param- 
eter. E. Lord [2] has prepared, by quadrature, exhaustive tables of levels of 
significance of G for sample size varying from 2 to 20 corresponding to the 
probabilities 0.10, 0.05, 0.02, 0.01, 0.002, and 0.001. An approximation to the 
distribution of u = const. G has been studied by P. B. Patnaik [4] and the 
power of the w-test has been investigated by Lord [5]. E. S. Pearson [6] has ex- 
amined the effect of nonnormality on u-test involving range. The purpose of the 
present note is to develop the probability distribution of G as a series whose 
terms reduce to Beta functions and to observe the efficiency of the G-test. 


2. The distribution of G. For a sample of size n from a normal population 
with mean a and standard deviation unity the distribution of Z is given by 


f- 
(1) oo 2 eee)? 


’ 


and the distribution of the semi-range, W, has been shown by the author [7] 
to be given by 


(2) p(w) = ke tow 716 a: C; wen 
i=0 

where 

ae 

2°16 (5)n7/ 


(3) k= (/2/x)"" 


and C; are functions of n. The distribution (2) is useful for small sample sizes, 
and C; coefficients have been computed for sample sizes up to 8, using an ap- 
propriate expansion [8] of the normal probability integral. 

Since # and W are independently distributed, 


(4) p(z, W) ian 2 ket)? (ot F786 > C; wre 
2a ‘0 
Putting 


5) 
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we have 


y ‘Dy —[2nG2+-(n4 << y rn+2i— 
(6) p(G, W) = k V -e> oo v ar". 
7 


=O 


and hence, integrating out W from (6), 


ne C;Tl4n + a] 

7) HG) =k f: . — - , 

(7) P\ ?) 4 Qe dX |2nG? + (n + 4) /6)i"** 

The terms of (7) are easily seen to reduce to Beta functions and hence the 
probability integrals of G can be evaluated with the help of tables of the in- 
complete Beta function. 


3. Confidence intervals. For a given probability level a and sample size n, 
using distribution (7), the limit, 4, may be obtained from the relation 
» 1 ae 
(8) [ p(G) aG = e 
Jo 


9» ’ ‘ 


and the average length of confidence interval for the location parameter a based 
on G will be given by 


(9) I(G) = 2rK(w). 


E(w) for sample size ranging from 2 to 1000 has been computed by L. H. C- 
Tippett [9], and with the help of Tippett’s table of mean range confidence inter- 


she al . ; : s ne ‘ 
vals, /(G@), for different values of a and n have been obtained. Values of = I(G) 
a0 

are given in Table I. The average length of confidence interval /(@) for a partic- 
ular value of a and n may be compared with the corresponding average length 
of confidence interval based on Student’s t-test, given by 

, E(S) 


Vn 


(10) I(t) = 2a 


where 


S = > (xz; — 3)*/(n — 1) 


i=] 


/ . 
and A is given by 


mae 
a [ oan 1es, 


p(t) being the probability function of ¢. 

Values of — /(t) for different values of a and n are given in Table I. The com- 
parative efficiency of the G and ¢-tests may be studied by observing 

: I(t) 
(12) Sele, 

[(G) 


and the values of # have been provided in Table I. 
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TABLE I 
Confidence interval for a based on G and t-tests 


1) 1) 


20 2e 


a 


- 100 .4979 -4941 -9975 
.050 2.2071 2.2017 -9976 
-010 5.0913 -0783 .9974 
.001 ).2148 . 1675 .9971 


.100 0891 .0839 .9952 
050 4761 .4658 | .9930 
010 7083 | 2.6907 |  .9931 


100 .9025 .8963 | .9931 
.050 .1792 .1670 .9897 
.010 .9608 .9354 .9870 





.050 0112 .9987 |  .9876 
010 5916 | 5663 .9841 


.050 - 9006 -8873 . 9852 
.010 3711 .3442 .9804 


.050 .8200 8070 9841 
010 2214 .1938 .9774 


.050 .7078 -6957 - 9829 
-010 -0248 - 9996 .9754 


050 6321 | 6211 | 9826 
010 9026 8765 9711 


-050 .5791 5663 .9779 
-010 -8142 - 7897 - 9699 


17 .050 0.5167 0.5062 .9797 
.010 0.7212 0.6974 .9670 


20 -050 0.4706 0.4619 -9815 
| -010 0.6536 0.6313 .9659 


1In computing 1(G) for n > 8 the values of \ have been taken from Lord’s Tables [2] 
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It is interesting to note from Table I that in small samples the average length 
of confidence interval for the G test compares favourably with that for the 
t-test. The test has the advantage that it is easier for computation. 
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A PROPERTY OF SOME TYPE A REGIONS 
By Herman CHERNOFF 
University of Illinois 


1. Summary. In a test of an hypothesis one may regard a sample in the 
critical region as evidence that the hypothesis is false. Let us assume that for 
some reason it is desired to increase the critical size of the test, i.e., to make 
rejection of the hypothesis more probable. Then one may expect that an ob- 
servation which led to rejection in the first test should still lead to rejection in 
the new test. In other words, one should expect W, > W,, if a > a’, where 
W,, is the critical region for the test of size a. An example is given where regions 
of type A’ are uniquely specified except for sets of measure zero, but fail to have 
this property. 


2. Example. We shall consider type A regions for the hypothesis @ = 0 where 
our sample consists of one observation with density 


p(x, 0) = (2x) (1 + 6) exp [— (x — 0)°(1 + 6)/2] fora > —1. 


‘ Regions of type A were introduced by Neyman and Pearson (see [1]). 





TYPE A REGIONS 473 


O'p(x, 8) 


oe" = 


Since p(z, @) permits differentiation under the integral sign, i.e., [ 
> 


a diy : : i oa : 
=! p(x, 6) dx for i = 1, 2, then W is a region of type A of critical size a for 
Ww 


the hypothesis 6 = 0 if 


(1) [ p(z, 0) dx = a, 


" . Op{z,0) , 
2) I, ee 


and if for every region W’ satisfying equations (1) and (2) 
; a p(x, 0) a p(x, 0) 
; ——— dr > ——— dz. 
‘3) w 067 —~ Jw OF = 


As a consequence of the results of Dantzig and Wald which made use of the 
Neyman-Pearson Fundamental Lemma (see [1] and [2]), it follows that there is 
a region of type A for any a, 0 S @ S 1, and that a necessary and sufficient 
condition that W be a region of type A of size a is that equations (1) and (2) 
hold and that there exist real numbers k, and kz so that, with the possible ex- 
ception of a set of measure zero, 


dp(z, 0) 


(4) a pl, 0) = 


ae > ky p(x, 0) + ke 


for x in W 
and the opposite inequality holds for z not in W. * 

We shall show that for a close to zero, a critical region of type A consists of 
the union of three intervals (— ~, a), (b, c), and (d, ©), wherea <1 <b < 
c < d (except possibly for a set of measure zero). Furthermore, as a —~>0,a—> —«, 
d — «, and b and c — 1. From this last remark it follows that there are critical 
sizes a and a’, a > a’, so that the corresponding critical regions of type A, 
W., and W,, , are uniquely specified except for sets of measure zero, but a sub- 
set of W., of positive measure is not in W, . 


3. Proof. 
Part 1. First we note that inequality (4) reduces to 


(5a) (2c — 3/2)+ 3 +2-—2°/2) >k + ($+ 2 —2°/2) forzin W. 
This inequality in turn reduces to 
(5b) a+o2+2z2+a2>0 for x in W, 


where z = (x — 1)/2 and c, and c, are real numbers depending on k, and ky . 
Hence, for 0 < a < 1, W is (except possibly for a set of measure zero) either 
the union of two intervals (—2, a) and (b, ~), where a < b, or the union of 
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three intervals (—=<, a), (b, c), and (d, ©), where a < b < ¢ < d, depending 
; 2 
on the number of real roots of 24 + ¢2* + z+ @ = 0. 
t 


4 


Let us define g(t) = (22)? exp [—/2], AQ) = [ g(x) dz, Bit) = 


at at 
! xrg(z) dx = g(t), C(t) = | rg(x) dr = —ty(t) + A(t), and y(t) = 


(¢ — 2)¢(t). Then we see that in the case where W is the union of two intervals, 
equations (1) and (2) reduce to 


(6) A(b) — A(a) = 1 —- 
(7) ¥(b) = (a), 
and in the case of three intervals equations (1) and (2) reduce to 
(8) A(d) — A(c) + A(b) — A(a) = 1 — a, 
(9) ¥(d) — ¥(c) + ¥(b) — Ya) = 0. 
Since y(t) > 0 for t > 2 and y(t) < 0 for t < 2, the largest value of A(b) — 


A(a) consistent with equation (7) is [ g(t) dt < .98. Thus for @ sufficiently 


small, a region of type A must consist of the union of three intervals except 
possibly for a set of measure zero. 


Part 2. That a— —» andd— ~ as a — 0 is obvious. Equation (9) prevents 
b and ¢ from both being less than 1 — +/2 (the minimizing value of y(¢)) or 
from both exceeding 1 + +/2 (the maximizing value of ¥(t)). Since the interval 
(b,c) must then have some points in common with the interval (1 — / 2, 1+ +/2) 
it follows that b and c are bounded as a — 0. 

The roots of equation (5b) are a* = (a — 1)/2,b* = (b — 1)/2,c* = (ec — 1)/2, 
and d* = (d — 1)/2. Since the coefficients of 2° and z are zero and one, respec- 
tively, we have (a* + d*) + (b* + c*) = Oand (a* + d*)b*c* + a*d*(b* + c*) = 
—1. Setting a* + d* = —2L, we have a*d* = 6*c* — 1/(2L). Since a*d* — 
—o, LJ — (0+. Then a* and d* are both of the order of magnitude of L*. Let 
b* = L — e. Then c* = L + e and equation (8) gives us e = O(a). Applying 
equation (9) and noting that for z = 0, zx = l and y’(1) > 0, we see that e ~ 0, 
and e = Oly(a) — v¥(d)] = o(L). The two results L — 0+ and e = o(L) imply 
that for a small enough b and ¢c — 1 but b and c exceed one. This is the desired 
result. 
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? We write a = o(b) if a/b > 0 as a— 0 and a = O(b) if there is a constant k so that 
a\| < k |b! for a small enough. 
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A PROPERTY OF SOME TESTS OF COMPOSITE HYPOTHESES 


By C. M. Stren 
University of Chicago 


In all common statistical tests, a result significant at the 1 per cent level is 
necessarily significant at the 5 per cent level. In this note we show that this 
statement is not true for all statistical tests. More precisely, for any a, ae 
satisfying 0 < a, < a, < 1, we construct a composite hypothesis Hy and a 
simple hypothesis H; such that there are sets w; , w2 in the sample space which 
are the unique most powerful critical regions of size a; , a2 , respectively, for 
testing Hy against H, . Furthermore, w; , we are similar regions. But we does not 
contain w, . 

Let X be a random variable which can take one of the four values 1, 2, 3, 4. 
Let Hy consist of two simple hypotheses Hy and Hg , where Hp states that 
P{X =i} = D: , and H¢ states that P{X = i} = Di ; and let H; state that 
P{X = i} = p;for7z = 1, 2,3, 4. Later, we shall determine appropriate positive 
values for the p;, p; , p: . Let x; = pi/pi, m1 = pi/p:. By a slight modifica- 
tion of the Neyman-Pearson lemma [1] (see also [2]), the region w, consisting 
of the points x = 1 and x = 2, and the region w: consisting of the points x = 1 


and x = 3, are both most powerful critical regions and similar if and only if 


rw, + Powe = pir + poms , 
(a) / ‘ ” ” 
Pim, + Pts = Pit + Pst ; 
(b) there exist constants a; , d2, b; , bo > O with a, + b; > O, ae + be > O, 
such that ayr, + birt , ayr2 + byre are both less than or equal to ayr3 + byr3 , 
ayr, + byrd , and aor, + ber, , Gers + bers are both less than or equal to 
aye, + ber: , aor, + ber . Expressed geometrically in the (x’, x”)-plane, if 
a, &, b; , bo > O and “less than” holds in all the above relations (which will 
will always be the case in our construction), this means that the line joining 
points 2 and 3 intersects both axes at positive values, and the point 1 is inside 
and point 4 outside the triangle formed by this line and the coordinate axes. Of 
course Hy , Ho , H, are all probability distributions and all of the points 1, 2 


9 “9 
2 


3, 4 are to have positive probabilities, so that we want 
(c) > pi = 1, >. pir: = 1, Zz Dik: = 1; 
(d) pi > O, ™;> 0, n, >0. 
We shall show that conditions (a), (b), (c), (d) can be satisfied in a great 
variety of ways. Choose mio , 770 SO that rio > 110 , 20 > 20 , 70 > 30, Tio > 
, . . . 
40 , and (b) is satisfied when x; , x; are replaced by x40 , 10 , respectively. Let 
Pwo be an arbitrary nonnegative number. Choose pa , ps > 0 so that 
; ‘ ” ” 
Pwt10 + P»t2 = PwT10 + Potro , 


‘ ' , ” ” 
Pw%10 + P33 = PwF10 + Ps0Tz0 - 
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That this is possible is most easily seen geometrically by observing that the 
line x’ = x” separates point 1 from points 2 and 3, so that there exist weights 
Pw » Px , Pso for points 1, 2, 3, respectively, so that the center of gravity of 1 and 
2 lies on the line x’ = 2”, as does that of 1 and 3. Also the center of gravity of 
these three points with the assigned weights lies on the same side of x’ = x” 
as 2 and 3 while 4 lies on the opposite side. Thus we can determine py so that 


4 4 
, ” 
} Pwortio = > Pr0 TF io- 
i=l 


i=] 
Finally we take 


’ ” 
(2) Pio , Tio ” Tio 


Pi= 4 , r= 7 a Se 
ie Pio 3 Pi 0 » Pi m0 
j=l j=l j=l 
Then all the conditions (a), (b), (c), (d) are satisfied. By similar reasoning it is 
easy to see that the parameters can be chosen so that w,; and w, have arbitrary 
sizes a; and a , respectively. 
It is possible to obtain cases where Ho contains a continuum of simple hy- 
potheses, for example 


H(A): P{X = i} = Ap; + (1 — Api, 


with 0 < A < 1, where p; , p; are obtained as in the main part of this paper. 
The same tests are most powerful and similar. Many interesting questions arise 
but they seem not to be of any real statistical importance. 
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NOTE ON THE ESTIMATION OF A BIVARIATE DISTRIBUTION 
FUNCTION 


By Pauw B. Srvpson 
University of Oregon 
A continuous cumulative probability distribution F(a) can be estimated from 
a random sample (z;), i = 1, --- , n, by the step function G(x) = j/n, where 


j is the number of x; < z. In this single variable case, it is known that the prob- 
ability distribution 


(1) P{max | F(x) — G(x) | < X} 
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is the same for all distributions F(x) [2]. It might be expected that a similar 
invariant property would hold for continuous bivariate distributions. An ex- 
ample shows that such is not the case. 
Consider the cumulative distribution 
2 o 
’ ax 2—a)yz 
(2) F(e,y) = 24 4+ S— oye 


> 


0 <¢:s:<:1, 0O<y<l. 
Let 


(a) (a1 , y:) be a random observation from F(z, y); 


oy. 0 #) = fore 2 mandy 2m, 
G(z, y) = Oforz < mory <H; 


(ec) } <A <1; 


(d) J be the set of points (z, y) fulfilling the three conditions: F(z, y) > 
(1 — A), F(z, 1) <A, F(l, y) <A. 
It follows that 


(3) P(r ? Yi) é J) 7 P| F(z, y) eh G(z, y) | < A] for all z, y. 
For A = .72 anda = 0, .065 < Pl(x, , y:) ¢ J] < .066. 
For \ = .72 anda = 1, 057 < Pl(m , yi) € J] < .058. 


Thus, there are two continuous bivariate distributions for which the proba- 
bilities of the type (1) differ. 

If we consider a set of points of the independent variables (x, y) lying on an 
increasing function y of x, we reduce the bivariate problem to a single-variable 
problem. Let F(x, y) be a cumulative distribution, -° < z< +“,—-# < 
y < +, d’F/dxdy continuous almost everywhere. Consider a random sample 
(x; ,ys),% = 1, --- , n. Let G(z, y) = j/n, where j is the number of observations 
(x; , ys) such that zx; < z and y; < y. Let v = f(u) be any increasing continuous 
function, —-° < u < +, such thatv-—> +o asu— +. Define a set of 
points (u; , vj) as follows: For given observation (2; , y;), if ys > f(z), let v0; = 
y;, and y; = f(u,). If ys < f(x,), let wu; = 2; and v; = f(x,). Order the set such 
that uj1 < uj,j = 2,--- ,n. LetO < A < 1. Since the maximum deviations 
of the step function j/n from the distribution F(z, y) over the points v = f(u) 
occur at the end points of the “intervals”, we are interested in 


(4) P{greater of max | F(u;,v;) — j/n| and max | F(u;,v;) — (§ + 1)/n|} <X. 


The probability distribution of (4) is the same for all F(z, y) and equals the 
distribution for the single-variable case (1), when the size of the sample is the 
same. 
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To prove this let u be a vector (wu; , --- , Un). Let U be the set of u such that 
the property described in (4) holds. We have 


(5) Plu ¢ U| = n! II dF(u;, f(u;)). 


U j=l 


Let z; = Flu; , f(u;)], 2 = (21, °°° 5 2n)- 


(6) Masti oa [ a, 
Jz 


where 


zeZ if max{|2;—j/n| <A and max|z; — (7 + 1)/n| <A 
j i 


Since (6) does not depend on F(z, y), the probability is the same for all F(z, y) 
with the given properties. Nor does (6) depend upon the particular choice of 
f(u). 

The expression (5) is the probability distribution of the type (1) for the single- 
variable distribution F[x, f(z)]. We can test the hypothesis that a given random 
sample was derived from a particular distribution by means of the maximum 
deviation of the distribution from the step function derived from the sample. 
Values of the probabilities have been tabulated by Massey [1]. 
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ON THE NECESSARY AND SUFFICIENT CONDITIONS FOR THE 
CONVERGENCE OF A SEQUENCE OF MOMENT 
GENERATING FUNCTIONS 


By W. KozakIEwIcz 


University of Montreal 


In a previous paper ({1], pp. 61-69) the author studied the reciprocal relation 
between the convergence of a sequence of df’s (distribution functions) and the 
convergence of the corresponding sequence of mgf’s (moment generating func- 
tions) in the univariate case. It is the purpose of the present paper to give neces- 
sary and sufficient conditions for the convergence of a sequence {¢,(é; , &)} of 
mgf’s in two dimensions. The results can be extended to Euclidean spaces of 
higher dimensions. 





MOMENT GENERATING FUNCTIONS 


Let {F(a , Z2)} be a sequence of df’s. For z; > 0, let 


M(t, %) = | / sajna, Halts 1), 


and designate by M(a, , x2) the least upper bound of the sequence {M,(z, , x2)}. 
TuHeorem 1. If the sequence {F,(x,, 22)} converges on an everywhere dense 
set and if there exist numbers a, , a2 such that for | t;| < ai, 


(1) M(a, , 2) < K exp (—|4|21 — | &| 2), 


where K is independent of x, and x2 , then 
(a) there exists a df F(x; , x2) such that 


(2) lim F (11, %) = F(a, 22) 
at each point of continuity of F(x, , x2), 

(b) the mgf’s of F(x, , x2) and F,(2, , 22), say g(t , te) and gp (ti, te), exist for 
|\t| << ame, 

(c) lim galt, t2) = oh, t) for |t;| < a; and uniformly in each interval 


no 
it] S Be < a. 

To prove (a), notice that there exists a function F(z, , z2), continuous to the 
right and with nonnegative second difference, such that (2) holds at each con- 
tinuity point of F(x, , x2). From (1) we see that F is a df. 

Now let 8; < y; < a; (¢ = 1, 2), and denote by R, , for z > 0, the region 
| a;| < z. Let k and / be integers such that 1 > k > 0. Then, from (1), we find 
that, for |é;| < 6B: , 


(3) If. b exp (4,2) + te 22) dF, (x1, Le) 
1-Re 


< C{exp [(6: — vk] + exp [(82 — 2)k]}, 


where C is independent of k and J. 

The relations (2) and (3) imply the truth of (b) and (c). Thus Theorem 1 is 
proved. 

TuHeorem 2. Let {F(a , 22)} be a sequence of df’s and let {yn(t, t2)} be the cor- 
responding sequence of mgf’s. If gn(ti , te) exist for | t;| < a; , and if there exists 
a finite valued function ¢(t; , tz) such that limy..,. ent , &) = ol , be), | te | < as, 
then 

(a) the inequality (1) holds for | t;| < a: , 

(b) there exists a df F(x, , x2) such that (2) holds at each continuity point of 
F(a , 22), 

(c) for | t:| < a; , the mgf of F(a, , x2) exists and equals g(t, , ts), 

(d) lim galt, &) = o(t, te) uniformly for | t:| < Bi < ai (¢ = 1, 2). 


n—-2o 
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To prove (a), note that for | ¢;:| < a; , 2, > 0, we have 


I dF (1%, 2) < exp (— | t} a1 — | fe! 22) 


(4) 
. I exp (| t:| ws + | te} we) dF,(m, uz) < Moexp (—| 4] 21 — &| 22), 
ugezs 


where ¢n(| 4 |, | |) < Mo. Such a number My = Mpo(t,, &) exists since 
{on( | t: |, | & | )} converges for | t; | < a;. This gives an estimate for M,(x, , x2), 
which shows that (a) holds. The Helly selection principle ([2], pp. 60-62 and 
83) leads to (b). The relations (c) and (d) follow immediately from Theorem 1. 

From Theorems 1 and 2 we obtain 

Tueorem 3. Let {F,(21, x2)} be a sequence of df’s and let {yn(t; , t2)} be the cor- 
responding sequence of mgf’s which are all assumed to exist for | t; | < a; . Then the 
necessary and sufficient condition for the convergence of {gn(t , t)} for | ti| < a; 
ts that the relations (a) and (b) of Theorem 2 be satisfied. 
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A NOTE ON THE MAXIMUM VALUE OF KURTOSIS 


By H. C. Picarp’ 
University of Ghent 


In “A note on skewness and kurtosis,” J. E. Wilkins (Annals of Math. Stat. 
Vol. 15 (1944), pp. 333-335) gave a short and elegant proof of the inequality for 
skewness and kurtosis 


(1) B, > Bit 1. 


Then he gave an upper bound, depending on the population size N, for the skew- 
ness: 


(2) max 8; = (N — 2)/(N — 1)*. 


Now we shall derive an upper bound for the kurtosis. It will appear that the 
sign “‘=”’ in (1) is valid for the upper bounds, and the two maximum values 
indeed arise in the same “extreme” population. 

To find the maximum value of the kurtosis 6, we consider the function Dz} 
in the x-space, where =z; = N and Sz; = 0. We have to maximize x} — ADz — 
ura; . The maximizing values are given by the N equations, found by differ- 
entiation with respect to 2; 

(3) x;i— 2d7; — vw = 0,7 


' “Aspirant” of the Belgian National Foundation for Scientific Research. 
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9 


together with the two relations =z; = N and =z; = 0. Multiplication with z; 
and summation over all N equations give 


41NB. — 2NX = 0; 
hence 
max B. = 3. 


Since it is not possible that all values of x in the population are equal, the equa- 
tion (3) of third degree must have at least two different real roots, and hence it 
has three real roots, which we may represent by 


/2n 2A . 2 
V3 COS a, / cos (a + 3n), / cos (a + $n), 


where cos 3a = 4 cos‘a — 3 cos a = p/(2A/3)””. Suppose the number of those 
roots is, respectively, k, 1, and m, with sum N. Writing v for 1 — m, we have, 
since not all values in the population can be the same, 

(l1<k<N-1) 


k=0 
lloisu—e | 


f 
or | 
\ 


, 
jvo| < N — 2} 


with v even if N — k is even and v odd if N — k is odd. =z; = 0 gives, since 
A ~ 0, 
(k — $l — 3m) cos a + (30/31 — $+/3m) sin a = 0, 
(8k — 3N) cosa + 34/3v sina = 0, 
3k — N 
v3 ’ 

3k—N 
V (3k — N)? + 3v*’ 

—vV/3 
V (3k — N)? + 30?” 
Hence our second relation =z2 = N gives 


se = k cosa + i(—} cos a + 34/3 sina)” + m(—} cosa — 44/3 sin a)’ 


= k cos’ a + (l + m)(} cos’ a + 3sin?a) — (l — m)-434/3 sin a cos a 


a 30 _ piv + 26k — N)’ 
— (3k — N)? + 30° +o -* (3k — N)? + 30? 


(4) ae (3k — N) vvV/3 
+ 02V3 (gE + Be? 


tane = —- 
sina = 


cosa = 


_ @k— N)* 
(8k — N)? + 30?" 
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To find the maximum kurtosis we have to find the minimum possible value 


. se ial elas 5s TO hat 

of the expression (4). Partial differentiation of (4) a(3k — N) gives 
3_ _ 3k ~ »)" 23k — NY 
{(3k — N)? + 3v?}? 

= — i@k — N)‘ = $o°@k — NY’ + Yo" | 21k — NY’ + 9'}' + Yo 
{(3k — N)? + 3v*}? {(3k — N)? + 3v*}? 
ae 4 27,,4 

es <r + Fr <0. 

{(3k — N)? + 3v?}? — 

Hence 3N/2X for every v, is decreasing with increasing k, so that we have to 

take the greatest possible value of k. In that case 3k — N is certainly positive 

and we have to take the smallest possible value of | v| to minimize 3N/2X. In 

virtue of the conditions of k and v we have the “extreme” combinations k = 

N —1,|v| = landk = N — 2,v = 0. Substituting in (4) gives for A, respec- 

tively, 2(N* — 3N + 3)/(N — 1) and N. Since those values are equal only for 

N = 2 or3, and 
,N*-3N+3 


_a-—s° " 


if N > 4, we find for our upper bound 


N* — 3N + 3?* 


max pg = 4A = 


N-1 
And indeed 


max 62 = max 83 + 1. 
? After writing this paper, the author derived the more general formulae: 


(N — 1)e+1 4- 


ae ae on 


(N — 1) { (N — 1)*=*# —1} 
ee tees ee eS eae 


N(N — 1)**! 


Their proof will be published shortly in the Dutch mathematics and physics periodical 
Simon Stevin, 
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ABSTRACTS OF PAPERS 


(Abstracts of papers presented at the Santa Monica meeting of the Institute, June 15 and 16, 
1951) 


1. First Passage in Random Walks. T. E. Harris, The Rand Corporation. 


Consider a random walk on the integers with transition probabilities p, for r + r + 1 
and g, = 1 — p, forr—r — 1. Let Ni; be the first passage time from 7 to 7. Explicit expres- 
sions are given for P(Ni; < «) and E(Ni;| Ni; < ©), the latter expression sometimes 
being finite in transient chains. The second moment of N;; can also be found. Sufficient 
conditions can be given for the limiting distribution of No; , as 7 — ©, to be exponential. 
The conditions include some walks with infinite mean recurrence times. 


2. Distinct Hypotheses and Convex Sets. Lucten M. Le Cam, University of 
California, Berkeley. 


Given a measure yz on a set X, the densities of probability with respect to u on X are con 
sidered as points in a Banach space B. A test is a point in the unit sphere of the conjugate 
Banach space. The distinctness of two hypotheses can be discussed using their convex hulls 
in B. Theorems by Mazur, Klee, etc., give necessary and sufficient conditions for distinct- 
ness. It is shown that the theorem by A. Berger and A. Wald corresponds to the case in 
which the closed convex hulls are disjoint. The result can thus be slightly extended. 


3. Uniform Convergence of Random Functions with Applications to Statistics. 
HERMAN Rosin, Stanford University. 


Let X,,--- ,Xn,-°-** be a sequence of independent and identically distributed vari- 
ables with values in an arbitrary space X. Let T be a compact topological space, and let 
f(t, z) be a complex-valued function on 7 + X, measurable in z for each te T. Let P be the 
common distribution of the X; . Then if there is an integrable g such that | f(t, z) | < g(x) 
for all t e¢ T and z « X, and if there is a sequence S; of measurable sets such that 
P(X — uZ1Si) = 0 and for each i, f(t, z) is equicontinuous in ¢t for z ¢ S;, then with prob- 


ability one, (1/n) Df_sf(t, Xx) > [% x) dP(z) uniformly for t ¢ T, and the limit of the 


function is continuous. Since f(t, z) = e** satisfies the conditions of the theorem, the sample 
characteristic function converges to the population characteristic function uniformly with 
probability one in any bounded interval. Log L(x | 6) = f(z, @) satisfies the conditions of 
the theorem for many distributions, including the multivariate normal, Poisson, Cauchy, 
x?, and double exponential, and hence the almost certain convergence of maximum likeli- 
hood estimates to the true values if the parameter is restricted to a compact set is estab- 
lished for those cases. More difficult estimation procedures can also be shown to be con 
sistent by this method. 


4. A Sequential Test for Linear Hypotheses. Paut G. Hoet, University of Cali- 
fornia, Los Angeles. 


A sequential test for the general linear hypothesis is obtained by employing methods 
similar to those introduced by Wald in deriving his sequential ¢ test. Optimum properties 
of the test are studied. An explicit expression for pin/Pom is obtained, the evaluation of 
which requires incomplete Gamma function tables. 
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5. A Two Sample Test. Frank J. Massey, Jr., University of Oregon. 


Consider two (or more) samples arranged together in order of size and let z; < z2 << +--+ < 
ze be the a,th observation, the azth observation, etc., in order of size in the combined 
samples. Let nj; be the number of observations in the ith sample which are greater than 
zj;-. and less than or equal to z; . Then the joint distribution of the n,; is that of a two- 
variable contingency table with fixed marginal totals. This is an extension of the result of 
A. M. Mood and G. W. Brown for k = 1 (A. M. Mood, Introduction to the Theory of Statistics, 
1950, p. 398). (This work was sponsored by the Office of Naval Research.) 


6. Some Slippage Problems for the Normal Distribution. (Preliminary Report.) 
Epwarp Pautson, University of Washington. 


Let {Xia} (a = 1, 2,-+-- , m) be a sample of n independent observations from I; with 
probability distribution N(mjo;) (i = 1, 2,---, K), and let X; = D3.) Xia/n, 8? = 
2h (Xia — X,)?/(n — 1). First consider the problem of choosing one of the K + 1 hy- 
potheses Hy , H, , --- , Hx , when Ho is the hypothesis the K means are all equal, while 
H,(j = 1, --- , K) is the hypothesis the means are not all equal and m; = max; {m;}, when 
it is known a priori that the K standard deviations have a common known value o*. It is 
shown that of all decision procedures with a fixed probability a of making the wrong de- 
cision when Hp is true, the procedure [select H; if X; = max; {X;} and X; — a. X;/K;> 
Aao*, otherwise select Ho(A\. is a constant depending on a)] maximizes the probability of 
making the correct decision when one of the means has slipped to the right, provided we 
restrict ourselves to decision procedures which are symmetric and invariant under a change 
of location parameter. An analogous slippage problem is considered with respect to the 
variances, and it is shown that the Cochran-Solomon-Eisenhart procedure [select 0; if 
s} = max; {s?} and s;/23_, 83 = X, , otherwise select Ho] is in a corresponding sense the 
‘best’ possible. 


7. Minimax Procedures for Two-valued Decision Problems When the Size of 
the Sample Is Fixed. S. G. ALLEN, Jr., Stanford University. 


The problem considered is a minimax statistical decision procedure for choosing between 
two alternative actions, A; and A; , after taking n independent observations on a random 
variable z. The probability density p(z, 6) of z is known except for the value of a real, one- 
dimensional parameter 6. The loss if decision A; is taken is zero for 6 S 6) , and w:(6) 20 
otherwise; the loss if decision Ae is taken is w2(@) = 0 for 6 < % and zero otherwise. It is 
then shown that the likelihood ratio test is a minimax procedure if and only if there exists 
a triple (c, 0 , 62), such that 


maxe {wi(0)Pr[A(@; , 62) S c| 6]} = maxe {we(0)Pr[rA(@ , 62) > c | 6}, 


where 6; S 6 S 6: and X\(@,, 62) = II?_, [p(zi , 62)/p(zs , :)]. For the exponential class 
of distributions, sufficient conditions on the loss functions are found so that the above 
criterion is met. Finally, modifications of the results for the case of discrete exponential 
distributions are discussed. 


8. The Asymptotic Properties of Bayes Estimates. R. C. Davis, U. 8S. Naval 
Ordnance Test Station, China Lake. 


Let X,, X2,--- , etc., be independently and identically distributed chance variables 
possessing the common cumulative distribution function F(z, @), in which F(z, @) admits 
an elementary probability law f(z, @) depending upon an unknown parameter @. Let also 
\(@) be an assumed a priori distribution for 6. @ may assume values in Q, a closed subset of 
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the real axis. Denote by E(@| 2 , 22, «++ , 2, ; A) the conditional expected value of the 
a posteriori distribution of @ given a sample 2 , Zz , --- , Zn Of values of X,; , X2,--: , Xn. 
This is termed the Bayes estimate of 6. It has been surmised for several years that under 
some set of regularity conditions every Bayes estimate is BAN (best asymptotically nor- 
mal). It is proved that this surmise is correct. It turns out that for each property, i.e., 
consistency, asymptotic normality, and asymptotic efficiency, the property of the Bayes 
estimate can be established under regularity conditions quite similar to those assumed in 
proving the respective property of the maximum likelihood estimate as treated by Doob 
and Cramér. Moreover under conditions quite similar to those assumed by Wald to estab- 
lish the almost certain convergence of the maximum likelihood estimate, the Bayes esti- 


mate may be shown to converge almost certainly to the true value of the unknown param- 
eter 6. 


9. Problems of Estimation and Hypothesis Testing in Connection with Birth- 
and-Death Stochastic Processes. Eric R. lumen, University of California, 
Los Angeles. 


Methods of estimation and of hypothesis testing are developed for the two parameters 
of the distribution of the integer-valued chance variable z(t) associated with the time- 
homogeneous birth-and-death process of the Markov type. For a certain class of processes 
maximum likelihood estimates are obtained which are unbiased estimates of the ratio of 
the parameters, and the estimates are efficient in the Cramér sense. More general estimates 
are obtained which are asymptotically optimum. The estimates yield confidence limits for 
the ratio of the parameters and for the parameters. The methods used yield approximate 
estimates for all processes of the type considered. A large-sample method of discriminating 
between two processes is discussed, as well as nonsequential and sequential methods for 
testing simple hypotheses against certzin sets of alternatives. 


10. The Identifiability of n-dimensional Linear Structures. T. A. JeEves, Uni- 
versity of California, Berkeley. 


Notation: Italic letters denote n-dimensional random vectors. Boldface letters denote 
n X n matrices of sure numbers. Consider the random vectors Y = S + U with S andU 
independent. Vector S is assumed to satisfy the condition S = XB. Definition: B is said to 
be identifiable if S = X*B* implies that B and B* are row equivalent If U has a multi- 
normal distribution then B is unidentifiable if and only if S = ZC + V with rank C < rank 
B, where Z and V are independent and V has a multinormal distribution. 


11. A Probabilistic Study of Runs in Egg Production. (Preliminary Report.) 
Dorotuy CrupEN Lowry, University of California, Berkeley. 


This is a report on an analysis of the production records of some 450 hens of the same 
age and subject to the same environment The random variable studied, X; , is assigned the 
value 1 if the hen laid an egg on the ith day of a given period and 0 otherwise, and the ques- 
tions investigated are as follows. Are the variates X; identically and independently dis- 
tributed and, if not, what sort of stochastic model can be devised for which the distribu- 
tion agrees reasonably well with the observed production record? A test based on the total 
number of runs in a 30-day period results in failure to reject the hypothesis of independence, 
as does a test based on the serial correlation coefficient computed for families of hens. 
However the test based on the total number of runs in a 60-day period results in rejection 
of the hypothesis of independence in more cases than the chosen probability level would 
indicate. As several simple stochastic models failed to fit the data, more complex models 
are to be tried. 
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12. Estimation in a Set of Distribution Functions of the Same Type. Rozert R. 
Putz, University of California, Berkeley. 


Two distribution functions 6 and ® are said to be of the same type in case there 
exist constants a and b such that 6®(z) = @®)(ax+ 6b), witha > 0. A family § of identical- 
type distribution functions d,,, , the mean w and standard deviation o being components 
of a random vector @ = (yu, a), is considered. For a random experiment in which there is 
observed a distribution function selected from § in accordance with the distribution of 6, 
knowledge of the latter distribution may be used to predict the observed distribution func- 
tion 4,.. . For any estimated distribution function 4,3 , the error «(z) = %3,3(z) — %,,.(z) 
is considered in relation to its dependence on the relative parameter errors A; = (i — u)/p 
and A: = (¢ — a)/c. In the Gaussian case explicit expressions are obtained for max, | e(z) | 
and for the maximizing argument value £. Here in the particular case A: = 0 these become 
(£ — u)/o = A,/25, where 5 = o/y, and for small A; , max, | e(z) | = | 4; | /8\/2e + 
O0(A$), while in the case A; = 0, they reduce for small A: to | £ — wp | /o = 1+ $42 + O(A3) 
and max, | e(z) | = y | A: | + O(A3), where y = 0.242 --- . The difference e(z) = 3,3(z) — 
F{"}(x), where F, is the distribution function of a sample of size n drawn from the dis- 
tribution ,,¢ , is considered. 


13. On the Practical Application of Confidence Intervals for the Mean of a 
Normal Population. ArTHuR SxHaprro, Stanford University. 


Let X; , --- , X, be independent normal variables having the same unknown mean £ and 
the same, also unknown, variance o*. The shortest unbiased confidence interval for &, 
corresponding to the confidence coefficient a, is give by (1), X — ste/V/n —15¢5 X + 
8ta/Vn — 1, where X,s,t. have the usual meaning. However, if the length of the confidence 
interval exceeds a certain limit, the practical value of the result of estimation is nil. In all 
practical cases, formula (1) is used for estimating é only if sta/+/n — 1 S r*. Asa result, the 
probability that statement (1) regarding & will be correct is not equal to the confidence 
coefficient a. For fixed r*, ¢, a, denote by P(r*, o, «) the probability that (i), ste//n — 1 < 
r*, and (ii), (1) will be correct. Then P(r*, ¢, ~) can be shown to be a monotonic function of 
r*/o. In some cases, the experimenter may be sure that his o S a» . In these cases it is im- 
portant to be able to answer the question: what is the smallest value n(@ , a, a’) such 
that n = n(@ , a, a’) implies P(r*, ¢, a) = a’ where 0) = 1r*/oo , a’ < a. The purpose of 
this work is to tabulate n(@o , a, a’) for several combinations of a and a’. For example, if 
a = 95 and a’ = .90 then for r*/oo = 1.07, n(0), a, a’) = 9 and for r*/oo = .200, 
n(0) , a, a’) = 121. 


14. Some Results Concerning Random Numbers of Random Variables. (Pre- 
liminary Report.) Rospert F. Tate, University of California, Berkeley. 


Two theorems concerning random numbers of random variables are stated and proved. 
The first theorem asserts that if X; (i = 1,2, --- ,u) and Y; = 1,2, --- , ») are sets of 
observations each of which consists of independent, equi-distributed random variables. 
where uw and » are positive, integer-valued random variables, then, even assuming the X; 
independent of the Y; , we have Z#_, X; and 2j_, Y; independent if and only if » and » 
are independent random variables. The generalization to k sets of observations is immedi- 
ate. The second theorem constitutes a generalization of the Lindeberg-Lévy form of the 
Central Limit Theorem for k-dimensional random vectors. The random vectors cx ses 
x") are generalized to Ft 
on the values 0, 1, 2, --- 


-(k) S : 

- , Xuzn), Where win are random variables which take 
, n, and whose probability distributions degenerate at infinity 
As n— ~ the Central Limit Theorem will now hold, with the same limiting normal law as 


before. The proof follows from considerations involving the Continuity Theorem for char 
acteristic functions. 
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15. Inspection Plans Which Improve Lot Quality. Zivia S. WuRrTELE, University 
of California, Berkeley. 


Assume that all defective items found during inspection of the lot are replaced by non- 
defective items and that when inspection is terminated the entire lot is accepted and used. 
Let the cost be linear in the number of items inspected, of defective items not removed, and 
of defective items replaced. Consider the Poisson limiting case; let t be the expected number 
of defective items in the lot. For any a priori distribution of ¢ the Bayes plan is character- 
ized by a set of stopping points {(d, za)}, d = 0, --- , dy) , where d is the number of defective 
items found in a proportion za of the lot and where 244; > za. Inspection continues until a 
stopping point is reached or until more than dp defective items are found, in which case the 
entire lot is inspected. Methods are obtained for finding Bayes procedures and for calculat- 
ing the risk associated with auy plan of this type. For any such plan there exists an a priori 
distribution of ¢, other than a trivial one, with respect to which this plan is Bayes. 


16. Nonparametric Discrimination, II. Small-Sample Performance in Normal 
Populations. EvELYN Fix anp J. L. Hopaes, Jr., University of California, 
Berkeley. 


Consider the usual discrimination problem of assigning an individual I to one of two 
populations, from each of which a sample is available, on the basis of p measurements made 
on all individuals concerned. By defining a distance function in the p-dimensional space, 
and observing the population of origin of those sampled individuals ‘‘near’’ I, one may ob- 
tain a class of (sequential and nonsequential) nonparametric classification rules whose 
performance is asymptotically optimum as the sample sizes are increased, regardless of the 
populations being discriminated. In the present study, numerical results on the probabilities 
of misclassification are obtained, primarily for small samples from bivariate normal popu- 
lations. 


17. On Certain Estimators for the Parameters of the Distribution of Largest 
Values. Jutius Lies.etn, National Bureau of Standards. 


Estimators for the two parameters of the cdf of largest values, Prob. {X S z} = 
exp (—e-2@—™)), have been given by E. J. Gumbel and B. F. Kimball, who have studied 
their asymptotic behavior in infinitely large samples. This paper considers the behavior of 
estimators of one parameter when the other is known, and attempts to evaluate exactly 
the bias and efficiency for samples of finite size. This turns out to be possible in only a 
few situations, and it is found that numerical methods of approximation would be necessary 
for most cases. 


18. A Theorem on the Impossibility of Affine Resolvable Designs. S. 8. Surik- 
HANDE, Nagpur College of Science, India. 


The following theorem is proved. Suppose a Balanced Incomplete Block Design (B.I.B.D.) 
with parametersv =b=n%t+n+1;r=k = nt+1;\ = t exists, anda BJI.B.D. with 
parametersv =b=n(n*t+n+1)+1;r=k=n%i+n+1;A = nt + 1 does not exist, 
then an Affine Resolvable B.I.B.D. (R. C. Boss, ‘‘A note on the resolvability of balanced 
incomplete block designs,’’ Sankhya@ Vol. 6(1942), pp. 105-110) with parameters v = nk = 
n(n — 1)t+n*?;b = nr = n(n*t+n+1);A = nt + 1 does not exist. 

The proof depends upon the fact that the second design can be constructed if the other 
two exist. Making use of a result given by Schiitzenberger (‘‘A non-existence theorem 
for an infinite family of symmetrical block designs,’’ Annals of Eugenics, Vol. 14 (1949), 
pp. 286-287) and others it is proved that an Affine Resolvable B.I.B.D. with parameters 
v = 63, b = 93,r = 31, k = 21, = 10 does not exist. 





488 NEWS AND NOTICES 


19. The Nonexistence of Difference Sets for Group Designs. S. S. Surix- 
HANDE, Nagpur College of Science, India. 

The following theorem is proved. Let v = mn, where n is a prime congruent to 3 (mod. 4). 
Let nonnegative integers d; and d2 satisfy the relation k(k — 1) = (m— 1). + (n— 1)md2. 
Define @ = k + da(m — 1) — Agm and let be a prime factor of @ occurring in it to an odd 
degree. Then if (—n/b) = —1, where (q/p) stands for the Legendre symbol, there does not exist 
a difference set of k integers which gives rise to a group design (K. R. Narr ann C. R. Rao, 
“A note on partially balanced incomplete block designs,’’ Science and Culture, Vol. 7 
(1942), pp. 615-616) with parameters v = b = mn,r = k,d , Ax , where there are n groups of 
m treatments each, in b blocks of size k, such that each pair of varieties from the same group 
occurs in d, blocks while each pair of varieties coming from different groups occurs in dz blocks 
This generalizes a result of Chowla (‘‘On difference sets,’’ Proc. Nat. Acad. Sci., Vol. 35 
(1949), pp. 92-94), the proof following along the lines of his paper. 


20. Concerning Large-Sample Tests and Confidence Intervals for Mortality 
Rates. Joun E. Wausu, Bureau of the Census. 

This is an extension of the paper ‘‘Large-sample tests and confidence intervals for mor 
tality rates’? which appeared in the June, 1950 issue of the Journal of the American Statistical 
Association. The results of this other article are placed on an axiomatic basis and the 
validity of these axioms is discussed. The basic underlying concepts are explained and some 
numerical examples of applications are worked out. Also additional significance tests and 
confidence intervals are presented. 


a 


NEWS AND NOTICES 


Readers are invited to submit to the Secretary of the Institute news ttems of interest 


Personal Items 


The Shewhart Medal for outstanding service and leadership in the field of 
quality control was awarded to Dr. Martin A. Brumbaugh, Director of Statistics 
at Bristol Laboratories, Inc., Syracuse, New York, at the convention of the 
American Society for Quality Control in Cleveland May 23 and 24. Dr. Brum- 
baugh is a founder of the American Society for Quality Control and at this time 
is first vice-president of the group. 

Dr. W. Edwards Deming visited Japan in July, 1950, and delivered a number 
of lectures and conducted two 8-day courses in quality control in Tokyo and 
Fukuoka. Considerable interest in statistical methods and quality control among 
Japanese engineers and industrialists has arisen, largely as result of this visit. 

Mr. John C. Hintermaier, formerly Superintendent of Development, Standards 
Testing Laboratories, Cluett Peabody & Co., Research Division, Troy, New 
York, is now with the Textile Materials Engineering Laboratory, Philadelphia 
Quartermaster Depot, Philadelphia 45, Pennsylvania. 

Mr. Roy R. Kuebler, Jr. is returning to his position of Associate Professor of 
Mathematics at Dickinson College, Carlisle, Pennsylvania, after a year’s leave. 

Mr. Marvin Masel has resigned his position as Engineering Statistician with the 
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Goodyear Aircraft Corporation at Akron, Ohio, to accept employment with the 
Carrier Equipment Planning and Development Organization, Western Electric 
Company, Kearney, New Jersey. 

Dr. C. A. Metzner has transferred from the Survey Research Center at the 
University of Michigan, where he served as Study Director, to the School of 
Public Health to direct the social program. His new title will be Research As- 
sociate of the Bureau of Health Economics. 

Mr. W. E. Patte is now employed in the Control Department of the Lauren- 
tide Division, Consolidated Paper Corporation, Ltd., Grand’Mere, Quebec. He 
formerly held the position as Senior Chemist at the E. B. Eddy Co., Hull, Quebec. 

Mr. Bernard E. Phillips, who has been with the Operations Research Office 
of the Johns Hopkins University for the last one and a half years doing work for 
the Army as a mathematical statistician, has accepted a position with the 
Parsons-Aerojet Co. His work is in the evaluation of trajectory-measuring in- 
strumentation systems as Statistical Engineer. 

Professor Paul R. Rider is on leave of absence from Washington University to 
act as Mathematical and Statistical Adviser to the Chief of Air Research, Wright- 
Patterson Air Force Base, Dayton, Ohio. 

Dr. Harry G. Romig, formerly a member of the Technical Staff of the Bell 
Telephone Laboratories, Inc., has accepted a position as Staff Engineer, Hughes 
Aircraft Co., Culver City, California. 

Dr. Monroe G. Sirken has been a Research Associate in the Statistical Labe- 
ratory, University of California, during the past academic year, while a post- 
doctoral fellow of the Social Science Research Council. He has accepted a position 
as Assistant Professor and Research Associate in the Survey Research Center 
at the University of Michigan. 

Mr. E. Webb Stacy, formerly of the University of North Carolina, has been 
appointed Assistant Professor of Statistics in the Department of Statistics of the 
College of Commerce and Industry, University of Wyoming. He has recently 
been elected First Vice President of the Denver, Colorado, Chapter of the 
American Statistical Association for the coming year. 

Dr. William F. Taylor has been appointed Chief, Department of Biometrics, 
at the School of Aviation Medicine, Randolph Air Force Base, Texas. He had 
been with the School of Public Health, University of California, serving as an 
Associate. 

Mr. Dan Teichroew, as an employee of the Institute of Statistics, Raleigh, 
North Carolina, will be in Washington for some special work with the National 
Bureau of Standards beginning July 1, 1951. 

Dr. John E. Walsh, formerly with the Rand Corporation, is now a Consultant 
with the U. 8. Bureau of the Census, Washington, D. C. 

Professor Jacob Wolfowitz joined the Department of Mathematics at Cornell 
University in July, 1951. 

Mr. Roger D. Keeney of the Metropolitan Life Insurance Co. died of polio 
last September. 
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The Educational Testing Service is offering for 1952-53 its fifth series of 
research fellowships in psychometrics leading to the Ph.D. degree at Princeton 
University. Open to men who are acceptable to the Graduate School of the 
University, the two fellowships each carry a stipend of $2,375 a year and are 
normally renewable. 

Fellows will be engaged in part-time research in the general area of psychologi- 
cal measurement at the offices of the Educational Testing Service and will, in 
addition, carry a normal program of studies in the Graduate School. Competence 
in mathematics and psychology is a prerequisite for obtaining these fellowships. 
The closing date for completing applications is January 18, 1952. Information 
and application blanks will be available about November 1 and may be obtained 
from: Director of Psychometric Fellowship Program, Educational Testing Serv- 
ice, 20 Nassau Street, Princeton, New Jersey. 


RR 


New Members 


The following persons have been elected to membership in the Institute 
(March 1, 1951 to May 31, 1951) 


Abrams, George W., M.A. (Columbia Univ.), Member of Technical Staff, Bell Telephone 
Laboratories, 306 W. 92nd Street, New York 25, New York. 

Bagg, John L., M.S. (Univ. of Ill.), Graduate Assistant, Mathematics Department, Michi- 
gan State College, 403 B, Willow Lane, E. Lansing, Michigan. 

Baranow, Lotty v., Ph.D. (Univ. Berlin), Licensed expert in Insurance Science, Economic 
Mathematics and Mathematical Statistics, Nymphenburgerstrasse, 36 III, Munich 2, 
Germany, U. 8. Zone. 

Bicking, Charles A., M.S. (Mass. Inst. of Technology), Ordnance Research and Develop- 
ment Division, Office of the Chief of Ordnance, Washington, D. C. 

Billingsley, Patrick, B.S. (U.S. Naval Academy), Graduate Student, Princeton University, 
Box 708, Fine Hall, Princeton, New Jersey. 

Bjerve, Petter J., Director of the Central Bureau of Statistics, Dronningensgate 16, Oslo, 
Norway 

Bott, George, S.M. (Mass. Inst. of Technology), D.1I.C. Staff Member, Massachusetts 
Institute of Technology, 1728 Queens Lane, Apt. 180, Colonial Village, Arlington 
Virginia. 

Brown, Mrs. L. T. (Bernice), M.S. (lowa State College), Research Statistician, Rand 
Corporation, 1500 Fourth Street, Santa Monica, California. 

Burkhardt, Felix, Ph.D., Professor of Mathematical Statistics, University of Leipzig, 
Mehrinzstrasse 16, Markkleeberg bei Leipzig, Germany. 

Cohen, Doris H., B.A. (Hunter College), Graduate Student, University of Michigan, 252 
Alice Lloyd Hall, Ann Arbor, Michigan. 

Culabutan, Paz B., B.S. (Univ. of Philippines), Researcher, General Trias, Cavite, 
Philippines. 

Danford, Masil B., M.A. (Univ. of Texas), Graduate Student, Institute of Statisties, 
North Carolina State Colleze, Raleigh, North Carolina. 

Dowd, John E., B.C. (Univ. of Manitoba), Graduate Student, University of Manitoba, 
290 Langside Street, Winnipeg, Manitoba, Canada. 

Easterbrook, Marjorie J., A.M. (Univ. of Mich.), Graduate Student, University of Michi- 
gan, 83 E. Girard Boulevard, Kenmore 17, New York. 
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Engel, Jerome N., B.S. (Univ. of Mich.), Graduate Student, University of Michigan, 711 
Haven Avenue, Ann Arbor, Michigan. 

Fend, Alvin V., M.S. (Univ. of Ill.), Assistant, Mathematics Department, University of 
Illinois, 910 College Court, Apt. 609, Urbana, Illinois. 

Gamoneda, Ramon G., Ph.D., Assistant Professor of Financial Mathematics and Business 
Statistics, University of Havana, Presidente Zayas N-152, Havana, Cuba. 

Gordon, Eugene S., B.S. (Univ. of Mich.), Assistant, Statistical Research Laboratory, 
University of Michigan, 914 S. State Street, Ann Arbor, Michigan. 

Gourrich, George E., M.S. (Univ. of Calif. at L.A.), Research Engineer, Northrop Aircraft, 
Inc., Hawthorne, California, 12830 Parkyns Street, Los Angeles 49, California. 

Graf, Ulrich, Ph.D., Professor of Engineering and Lecturer in the Philosophy-Theological 
School, Bamberg, Rubensstrasse 4, (22a) Wuppertal-Vohwinkel, Germany. 

Harrington, Gordon M., B.E.E. (Georgia Tech.), Research Associate, Educational Re- 
search Corporation, 10 Craigie Street, Cambridge 38, Massachusetts. 

Hood, William C., Ph.D. (Univ. of Toronto), Assistant Professor of Economics, Depart- 
ment of Political Economy and Special Lecturer in Statistics, Institute of Business 
Administration, University of Toronto, 273 Bloor Street, W., Toronto, Canada. 

Jennings, Walter, M.A. (Ohio State Univ.), Associate Professor of Mathematics and Me 
chanics, U. S. Naval Postgraduate School, 919 Creek Drive, Annapolis, Maryland. 

Johnson, Ralph B., M.A. (Columbia Univ. & Univ. of Tenn.), Instructor of Mathematics, 
Clemson Agricultural College, 118 Edgewood Avenue, Clemson, South Carolina. 

Leader, Virginia A., B.A. (Univ. of Mich.), Graduate Student, University of Michigan, 
Alice Lloyd Hall, Ann Arbor, Michigan. 

Lindquist, Gerald E., B.A. (Univ. of Mich.), Assistant to the Actuary, Michigan Depart 
ment of Insurance, 1341 Alexander Street, S.E., Grand Rapids 6, Michigan. 

Magwire, Craig A., M.S. (Univ. of Mich.), Graduate Student and Teaching Assistant, 
Stanford University, 315-12 Stanford Village, California. 

Malach, Robert J., M.A. (Univ. of Ill.), Graduate Student, University of Mlinois, 1112 W 
Oregon, Urbana, Illinois. 

Mauldon, J. G., M.A. (Oxford Univ.), Fellow, Tutor and Lecturer, Corpus Christi College, 
Oxford, England. 

Muller, Mervin E., M.A. (Univ. of Calif. at L.A.), Graduate Student and Teaching Assist- 
ant, University of California at Los Angeles, 1061 N. Sierra Bonita Ave., Hollywood 46, 
California. 

Nassimbene, Raymond, A.B. (Univ. of Denver), Economist, Office of Business Economics, 
U. 8. Department of Commerce, 3709 Northampton, N.W., Washington, D. C. 

Probst, David A., M.S. (Univ. of Pittsburgh), Graduate Student, University of North 
Carolina and Geologist, Gulf Research & Development Company, Pittsburgh, P. O 
Drawer 2038, Pittsburgh 30, Pennsylvania. 

Razgunas, Leo, B.S. (Univ. of Mich.), Graduate Student, Department of Mathematics 
University of Michigan, 545 Packard Avenue, Ann Arbor, Michigan. 

Richter, Hans, Ph.D. (Univ. of Leipzig), Professor, University of Freiburg, Elektraweg 2, 
(17b) Haltingen Krs., Lorrach, Germany. 

Roberts, Helen M., A.M. (Boston Univ.), Assistant Professor of Mathematics, University 
of Connecticut, Storrs, Connecticut. 

Roshwalb, Irving, M.A. (Columbia Univ.), Statistician, Opinion Research Corporation, 
44 Nassau Street, Princeton, New Jersey, 721 Walton Avenue, New York 61, New York. 

Sasso S., Roberto, Chief, Machine Tabulation Section, Direccion Gral. de Estadistica, 
P.O. Box No. 186, San Jose, Costa Rica. 

Schiller, Nathan, B.A. (Univ. of Mich.), Graduate Student, Department of Mathematics, 
University of Michigan, 407 E. Liberty, Ann Arbor, Michigan. 

Shank, Ellsworth B., B.S. (Franklin & Marshall College), Analytical Statistician, Ballistics 
Research Laboratory, Aberdeen Proving Ground, Maryland, 65 South Grant Street, 
Manheim, Pennsylvania. 
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Smith, Harry, Jr., M.A. (Univ. of Del.), Graduate Student in Mathematical Statistics, 
University of North Carolina, 1 E. Lake Street, Middletown, Delaware. 

Stanley, J. Perham, M.S. (Toronto), Assistant Statistician, Abitibi Power & Paper Com- 
pany, Ltd., 408 University Avenue, Toronto, Canada. 

Symons, Nancy, B.A. (Univ. of Mich.), Graduate Student, Department of Mathematics, 
Martha Cook Building, Ann Arbor, Michigan. 

Toranzos, Fausto I., Ph.D. (Univ. of LaPlata, Argentina), Professor of Statistics, Univer- 
sity of Cuyo, Luzuriaga 267, Mendoza, Argentina. 

Van Vliet, Clement J., M.A. (Univ. of Calif. at L. A.), Statistician, U.S. Navy Electronics 
Laboratory, San Diego 52, California. 

Wegner, Louis H., Jr., B.S. (Oregon State College), Research Assistant, Department of 
Mathematics, University of Oregon, 2227-3 Patterson, Eugene, Oregon. 

Wheeler, Ruric E., M.S. (Univ. of Kentucky), Instructor and Student, Mathematics De- 
partment, University of Kentucky, Lexington, Kentucky. 

Wilde, Richard D., B.S. (Iowa State College), Engineer, Divisional Quality Control, 
Cathode Ray Tubes, Sylvania Electric, 25 Troy Street, Seneca Falls, New York. 

Wishart, David M. G., B.Sc. (Univ. of St. Andrews, Scotland), Graduate Student and 
Assistant in Research, Department of Mathematics, Princeton University, Graduate 
College, Princeton, New Jersey. 

Yeilding, Richard P., M.A. (Oklahoma Univ.), Graduate Student, Institute of Statistics, 
University of North Carolina, Glen Lennor, 43-A, Chapel Hill, North Carolina. 

Zaluar-Nunes, Manuel, B.Sc. (London), President, Sociedade Portugesa de Matematica, 
Rua Serpa Pinto 17, Lisbon, Portugal. 

Zillig, Quentin V., M.S. (Marquette Univ.), Experimental Engineer, Ladish Co., Cudahy, 
Wisconsin, 1924 Forest Street, Wauwatosa 13, Wisconsin. 


Correction to ‘‘News and Notices, New Members,’’ March, 1951: 
Wunsche, Gunther, Dr. rer. Techn. (Techn. Univ. of Dresden), Chefmathematiker and 


Prokurist, Universititsdozent, Lenbachplatz 4, Munich 2, Germany. 


rr 


REPORT OF THE SANTA MONICA MEETING OF THE INSTITUTE 


The forty-seventh meeting (seventh West Coast meeting) of the Institute 
was held in Santa Monica, California, June 15th and 16th, 1951, at the offices 
of The Rand Corporation. The Biometric Society (Western North American 
Region) joined the Institute in sponsoring the meeting. Ninety-eight persons 
registered for the meeting, including the following forty-eight members of the 
Institute: 


Leo A. Aroian, Kenneth J. Arrow, G. A. Baker, Blair M. Bennett, David Blackwell, 
Julius R. Blum, Bernice Brown, George W. Brown, Thomas A. Budne, Douglas G. Chap- 
man, Edwin L. Crow, J. H. Curtiss, R. C. Davis, George Gerard Denbroeder, W. J. Dixon, 
Mary Elveback, Edward A. Fay, Evelyn Fix, Robert S. Gardner, M. A. Girshick, Jack C. 
Gysbers, T. E. Harris, J. L. Hodges, Jr., Paul G. Hoel, W. C. Hoffman, John M. Howell, 
Eric R. Immel, T. A. Jeeves, M. G. Kendall, Dorothy Cruden Lowry, A. W. Marshall, 
Frank J. Massey, Ray Mickey, A. M. Mood, L. E. Moses, Mervin Muller, Melvin P. Peisa- 
koff, R. P. Peterson, Joseph Putter, R. R. Putz, Harry G. Romig, Herman Rubin, Eliza- 
beth L. Scott, R. W. Shephard, R. C. Stillinger, Robert F. Tate, D. van Dantzig, and Mrs. 
Zivia S. Wurtele. 





SANTA MONICA MEETING 493 


At 10:00 A.M. Friday, Professor Mary Elveback presided at a session on 
Applications of Statistics in Biology, consisting of the following papers: 


1. Uniformity Yield Trials with Strawberries. G. A. Baker and R. E. Baker, University 
of California, Davis. 
2. Inverse and Multiple Zoological Sample Censuses. D. G. Chapman, University of 
Washington. 
3. Multiple Regression Analysis of Soil Data. C. H. Wadleigh, U.S. Salinity Laboratory, 
Riverside. 
. Genetic and Environmental Sources of Variation in Length of Gestation Period for Horse. 
W. C. Rollins and C. E. Howell, University of California, Davis. 
. Estimating Tolerance Limits for the Age Composition of a Population of Fish as Derived 
from Random Samples within Length Strata. T. M. Widrig, U. S. Fish and Wildlife 
Service, Stanford. 


At 2:00 P.M. Friday, Professor David Blackwell of Howard University and 
Stanford University presided at a session consisting of an invited address and 
three contributed papers as follows: 


1. Some Aspects of the Theory of Stochastic Processes in Absorbing Media (one hour). 
D. van Dantzig, University of Amsterdam. 

2. First Passage in Random Walks. T. E. Harris, The Rand Corporation. 

3. Distinct Hypotheses and Convex Sets. Lucien M. Le Cam, University of California, 
Berkeley. 

4. Uniform Convergence of Random Functions with Applications to Statistics. Herman 
Rubin, Stanford University. 


At 4:00 P.M. Friday, George W. Brown of The Rand Corporation presided at 
a session on Statistics in Medical Research and Public Health, consisting of the 
following papers: 


1. Sampling of Ophthalmological Data. F. W. Weymouth, Stanford University, and M. J 
Hirsch, Los Angeles College of Optometry. 

2. Human Heart Weight: A Study Based on 20,000 Autopsies. Emil Bogen, Olive View 
Sanatorium. 

3. Estimation of Median Effective Dose by Weighted Moving Average Methods. B. M. Ben- 
nett, School of Medicine, Seattle. 

. A Graphic Estimate of Means and Standard Deviations. F. B. Cramer, University of 

Southern California. 

5. Law of Mass Action as Applied to Tetanus Incidence. J.'T. Oliver, Los Angeles County 
Health Department. 


At 10:00 A.M. Saturday, Professor J. Neyman of the University of California, 
Berkeley, presided at a session consisting of an invited address and three con- 
tributed papers as follows: 


1. Current Trends in Statistical Work in the United Kingdom (one hour). M. G. Kendall, 
London School of Economics. 
2. A Sequential Test for Linear Hypotheses. Paul G. Hoel, University of California, Los 
Angeles. 
. A Two Sample Test. Frank J. Massey, Jr., University of Oregon. 
. Some Slippage Problems for the Normal Distribution. Preliminary Report. Edward 
Paulson, University of Washington. 
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At 2:00 P.M. Saturday, Professor Paul G. Hoel of the University of California, 
Los Angeles, presided at a session consisting of an invited address: 


Optimum Classes of Strategies (one hour). David Blackwell, Howard University and 
Stanford University, and M. A. Girshick, Stanford University. 


At 3:15 P.M. Saturday, Professor J. L. Hodges, Jr., of the University of Cali- 
fornia, Berkeley, presided at a session consisting of the following contributed 
papers: 


1. Minimax Procedures for T'wo-valued Decision Problems when the Size of the Sample 
Is Fized. 8. G. Allen, Jr., Stanford University. 
. The Asymptotic Properties of Bayes Estimates. R. C. Davis, U. 8. Naval Ordnance 
Test Station, China Lake. 
3. Problems of Estimation and Hypothesis Testing in Connection with Birth-and-Death 
Stochastic Processes. Eric R. Immel, University of California, Los Angeles. 
. The Identifiability of n-dimensional Linear Structures. T. A. Jeeves, University of 
California, Berkeley. 
. A Probabilistic Study of Runs in Egg Production. Preliminary Report. Dorothy 
Cruden Lowry, University of California, Berkeley. 
}. Estimation in a Set of Distribution Functions of the Same Type. Robert R. Putz, 
University of California, Berkeley. 
. On the Practical Application of Confidence Intervals for the Mean of a Normal Popula- 
tion. Arthur Shapiro, Stanford University. 
. Some Results Concerning Random Numbers of Random Variables. Preliminary Report. 
Robert F. Tate, University of California, Berkeley. 
. Inspection Plans which Improve Lot Quality. Zivia S. Wurtele, University of Cali- 
fornia, Berkeley. . 
. Nonparametric Discrimination, IL. Small-Sample Performance in Normal Populations. 
(By title.) Evelyn Fix and J. L. Hodges, Jr., University of California, Berkeley. 
. On Certain Estimators for the Parameters of the Distribution of Largest Values. (By 
title.) Julius Lieblein, National Bureau of Standards. 
2. A Theorem on the Impossibility of Affine Resolvable Designs. (By title.) S. S. Shrik- 
hande, Nagpur College of Science, India. 
3. The Nonezistence of Difference Sets of Group Designs. (By title.) S. S. Shrikhande, 
Nagpur College of Science, India. 
14. Concerning Large-Sample Tests and Confidence Intervals for Mortality Rates. (By title.) 
John E. Walsh, Bureau of the Census. 


The West Coast program committee met at noon Saturday and decided among 
other things to have the next West Coast meeting at the University of Oregon 
in June, 1952. 

Those attending the meetings were entertained by the Rand members of the 
Institute with a cocktail party and buffet luncheon at the home of Mr. and Mrs. 
J. D. Williams on Friday evening. 


A. M. Moop 
Assistant Secretary 
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PUBLICATIONS RECEIVED 


CaNSADO, ENRIQUE, Conferencias sobre Muestreo Estadistico, Instituto Nacional de Esta- 
distica, Presidencia del Gobierno, Madrid, 1950, xii + 240 pp. 

Cuune, J. H., anp DeLury, D. B., Confidence Limits for the Hypergeometric Distribution, 
published for Ontario Research Foundation by University of Toronto Press, 1950 
xiii pp. + 72 graphs, $2.25. 

DeLury, Dantet B., Values and Integrals of the Orthogonal Polynomials up to n = 26, pub- 
lished for Ontario Research Foundation by University of Toronto Press, 1950, v + 33 
pp., $1.25. 

Dixon, WItFrip J., anD Massry, Frank J., JR., Introduction to Statistical Analysis, Mc- 
Graw-Hill Book Company, Inc., 1951, x + 370 pp., $4.50. 

NaTIONAL Bureau or Stanparps, Tables Relating to Mathieu Functions, Columbia Un 
iversity Press, 1951, xlvii + 278 pp., $8.00. 

VI Zjazd Matematykéw Polskich, Warszawa, 20-23 1X 1948, (Dodatek do Rocznika Polskiego 
Towarzystwa Matematycznego, T. XXII), Instytut Matematyczny Universytetu 
Jagiellotiskiego, Krakéw, 1950, 106 pp. 
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Statistics in Production and Inspection ; ; ..Epwin G. OLps 
The Verification and Scoring of Weather Forecasts .. . , ... Irvine IL, GRINGORTEN 
Some Statistical Problems in Small Group Research . a ........ROBERT F, BALES 
Relations between Prices, Consumption, and Production. . ved . Kart A. Fox 
The Distribution of the Range in Samples from a Discrete Rectangular Population 
Pau R. RIDER 
Actuarial Science. en Cares A. SPOERL 
Statistical SGensuvement end Economic Mobilization Planning ...GLENN E, McLavucairm 
National Income . GEORGE Jaszi 
A Large-Sample Test of the Hypothesis that One of Two Random Variables Is Stochasti- 
cally Larger than the Other ...... ee ANDREW W. MARSHALL 


REPRINTS OF ABSTRACTS IN STATISTICAL METHODOLOGY 
BOOK REVIEWS 
The American Statistical Association invites as members all per- 
sons interested in: 
1. development of new theory and method 


2. improvement of basic statistical data 
3. application of statistical methods to practical problems. 


BIOMETRIKA 
A Journal for the Statistical Study of Biological Problems 


Volume 38 Contents Parts 1 and 2, June 1951 


1. Major Greenwood (with portrait). By P. L. McKINLAY. 2. Partial and multiple rank correlation. 
By P.A.P. MORAN. 3. Some questions of distribution in the theory of rank correlation. By S.T. DAVID, 
M. G. KENDALL, and A. STUART. 4. On distributions for which the Hartley-Khamis solution of the 
moment-problem is exact. By H. P. MULHOLLAND. 5. The effect of non-normality on the power func- 
tion of the F-test in the analysis of variance. By F. N. DAVID and N. L. JOHNSON. 6. Regression 
structure and functional relationship. By M. G. KENDALL. 7. An application of the distribution of 
ranking concordance coefficient. By A. STUART. 8. Some tests for randomness in plant populations. 
By MARJORIE THOMAS. 9. The geometry of estimation. By J. DURBIN and M. G. KENDALL. 
10. The frequency distribution of the product-moment correlation coefficient in random samples of any size 
drawn from non-normal universes. By A. K. GAYEN. 11. Note on the exact treatment of contingency 
goodness of fit and other problems of significance. By G. H. FREEMAN and J. H. HALTON. 12. Effi- 
ciency of the method of moments and the Gram-Charlier type A distribution. By L. R. SHENTON. i3. 
Tables of the 5 and 0.5% points of Pearson curves (with argument §: and #3) expressed in standard measure. 
By E. 8. PEARSON and M. MERRINGTON, 14. Random dispersal in theoretical populations. By J. 
G. SKELLAM. 15. Estimation problems when a simple type of heterogeneity is present in the sample, 
By W.M. LONG. 16. The Jacobians of certain matrix transformations useful in multivariate analysis (bas«d 
on P, L. Hsu’s lectures). By W. L. DEEMER and I. OLKIN, 17. Testing for serial correlation in least 
squares regression, II. By J. DURBIN and G. 8. WATSON. 18. Bi-variate k-statistics and cumulants 
of their joint sampling distribution. By M. B. COOK. 19. Charts of the power function for analysis of 
variance tests, derived from the non-central F-distribution. By E. 8. PEARSON and H. O. HARTLEY. 
20. A chart for the incomplete Beta-functior and the cumulative binomial distribution. By H.O. HART- 
LEY and E. R. FITCH. 21. MISCELLANEA, 22. REVIEWS. 


The subecription price, payable in advance, is 45e. inland, 54s. export (per volume including postage). Cheques 
should be drawn to Biometrika and sent to “The Secretary, Biometrika Office, Department of Statistics, 
University College, London, W.C. 1.” All Mai we cheques must be in sterling and di*.wn on bank 
having a London agency. 





ECONOMETRICA 


Journal of the Econometric Society 


Contents of Vol. 19, July, 1951, include: 


Ross M. RoBEeRTSON ; ; oa Jevons and His Precursors 
KeNNeEtH J. AkRow, THEODORE HarRRIs, AND JacoB MARSCHAK 

Optimal Inventory Policy 
GERARD DEBREU The Coefficient of Resource Utilization 
HERBERT A. SIMON A Formal Theory of the Employment Relationship 
Rosert Soiow. A Note on Dynamic Multipliers 
Juuian L. HoLLEY Note on the Inversion of the Leontief Matrix 
REPORT OF THE CHICAGO MEETING, December 27-30, 1950 
Book Reviews, Letter to the Editor, Announcements of Meetings. 


Published Quarterly Subscription to Nonmembers: $9.00 per year 


The Econometric Society is an international society for the advancement of economic theory in its 
relation to statistics and mathematics 


Subscriptions to Econometrica and inquiries about the work of the Society and the procedure in applying 
for membership should be addressed to William B. Simpson, Secretary, The Econometric Society, The 
University of Chicago, Chicago 37, Illinois, U. 8. A. 





MATHEMATICAL REVIEWS 


A journal containing reviews of the mathematical liter- 
ature of the world, with full subject and author indices 


Publication of this journal is sponsored by the American Mathe- 
matical Society, Mathematical Association of America, Institute of 
Mathematical Statistics, London Mathematical Society, Edinburgh 
Mathematical Society, Union Matematica Argentina, and others. 


Subscriptions accepted to cover the calendar year only. 
Issues appear monthly except July. $20.00 per year. 
Send subscription order or request for sample copy to 


AMERICAN MATHEMATICAL SOCIETY 
531 West 116th Street, New York City 27 





ROYAL STATISTICAL SOCIETY 


SPECIAL REPRINTS 


SYMPOSIUM ON STOCHASTIC PROCESSES: 
Stochastic Processes and Statistical Physics, J. E. MoyaL 
Some Evolutionary Stochastic Processes, M.S. BARTLETT 
Stochastic Processes and Population Growth, D.G. KENDALL 
(With Discussion on the papers) 
Price, post free 12s. 6d. 


TABLES OF SEQUENTIAL INSPECTION SCHEMES TO 
CONTROL FRACTION DEFECTIVE, F. J. ANscoMBE 
Price, post free, 2s. 6d. 


These papers, published in the Journal of the Royal Statistical Society, 
1949 have now been issued in reprint form. Copies may be obtained 
direct from 


The Royal Statistical Society, 
4, Portugal Street, 
London, W.C.2. 





SKANDINAVISK 
AKTUARIETIDSKRIFT 


1950 - Parts 3 - 4 
Contents 


Martin WEIBULL 
The Distribution of the ¢ and z Variables in the Case of Stratified Sample 
with Individuals Taken from Normal Parent Populations with ores 
eans 
A. Hatp anp §. A. Sinxksax. A Table of Percentage Points of the x*-Distribution 
K.-G. HaGstTRoeM .. a eaten .Risk Theory and Group Insurance 
Lars DAHLGREN 
A Theorem on Translations by Hille, and Its Interpretation from the Point 
of View of the Theory of Probability 
J. F. STEFFENSEN.... ‘ ; ‘ More about Invalidity Functions 
Tore DALENIUS.. eka nataey .The Problem of Optimum Stratification 
Sten MatmQuist 
On a Property of Order Statistics from a Rectangular Distribution 


Annual subscription: 10 Swedish Crowns (Apprex. $2.00). 
Inquiries and orders may be addressed to the Editor, 


SKARVIKSVAGEN 7, DJURSHOLM (SWEDEN) 





SANKHYA 


The Indian Journal of Statistics 
Edited by P. C. Mahalanobis 


Vol. 10, Part 4, 1950 


Completeness, Similar Regions, and Unbiased Estimation 
E. L. LEHMANN AND HENRY ScHEFFS 
The Estimation of the Mean of a Normal, Tolerance Distribution.....D. J. Finney 
Sequential Tests of Null Hypotheses............. C. RADHAKRISHNA Rao 
Some Contributions to Hotelling’s Weighing Designs..............K. 8S. BANERJEE 
A Note on the Marginal and the Optimum Size of Holding in Bengal. ...A. Guosx 
A Statistical Study on Multiple Cases of Disease in Households 
K. K. MarHEeN anv P. N. CHAKRABORTY 
A Note on the “Report on an Enquiry into the Family Budgets of Middle Class 
Employees of the Central Government .....ManEesH CHanp 
Power Function of Chi ~— are Test with Special Reference to Analysis of Blood 
Group Data. . S. JANARDAN Pot! 
Book Reviews 


_ Annual subscription: 30 rupees 
Inquiries and orders may be addressed to the 
Editor, Sankhya, Presidency College, Calcutta, India. 
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